Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir
2010-07-15
With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.
Optimal allocation of testing resources for statistical simulations
NASA Astrophysics Data System (ADS)
Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick
2015-07-01
Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.
On an additive partial correlation operator and nonparametric estimation of graphical models.
Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu
2016-09-01
We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.
On an additive partial correlation operator and nonparametric estimation of graphical models
Li, Bing; Zhao, Hongyu
2016-01-01
Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689
Probability of Detection (POD) as a statistical model for the validation of qualitative methods.
Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T
2011-01-01
A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.
Markov Logic Networks in the Analysis of Genetic Data
Sakhanenko, Nikita A.
2010-01-01
Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F
2010-07-19
A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.
2010-01-01
Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827
Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C
2018-03-07
Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
Pathway analysis with next-generation sequencing data.
Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao
2015-04-01
Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Peer-Assisted Learning in Research Methods and Statistics
ERIC Educational Resources Information Center
Stone, Anna; Meade, Claire; Watling, Rosamond
2012-01-01
Feedback from students on a Level 1 Research Methods and Statistics module, studied as a core part of a BSc Psychology programme, highlighted demand for additional tutorials to help them to understand basic concepts. Students in their final year of study commonly request work experience to enhance their employability. All students on the Level 1…
On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.
Li, Bing; Chun, Hyonho; Zhao, Hongyu
2014-09-01
We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.
Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876
Cluster detection methods applied to the Upper Cape Cod cancer data.
Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann
2005-09-15
A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.
NASA Astrophysics Data System (ADS)
Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir
2016-03-01
The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.
Chi-squared and C statistic minimization for low count per bin data
NASA Astrophysics Data System (ADS)
Nousek, John A.; Shue, David R.
1989-07-01
Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy
NASA Technical Reports Server (NTRS)
Nousek, John A.; Shue, David R.
1989-01-01
Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.
Learning process mapping heuristics under stochastic sampling overheads
NASA Technical Reports Server (NTRS)
Ieumwananonthachai, Arthur; Wah, Benjamin W.
1991-01-01
A statistical method was developed previously for improving process mapping heuristics. The method systematically explores the space of possible heuristics under a specified time constraint. Its goal is to get the best possible heuristics while trading between the solution quality of the process mapping heuristics and their execution time. The statistical selection method is extended to take into consideration the variations in the amount of time used to evaluate heuristics on a problem instance. The improvement in performance is presented using the more realistic assumption along with some methods that alleviate the additional complexity.
Bayesian demography 250 years after Bayes
Bijak, Jakub; Bryant, John
2016-01-01
Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889
Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin
2014-01-01
Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874
The Content of Statistical Requirements for Authors in Biomedical Research Journals
Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang
2016-01-01
Background: Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues’ serious considerations not only at the stage of data analysis but also at the stage of methodological design. Methods: Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Results: Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including “address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation,” and “statistical methods and the reasons.” Conclusions: Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible. PMID:27748343
Teaching an Introductory Statistics Course with CyberStats, an Electronic Textbook
ERIC Educational Resources Information Center
Symanzik, Jurgen; Vukasinovic, Natascha
2006-01-01
In the Fall 2001 semester, we taught a "Web-enhanced" version of the undergraduate course "Statistical Methods" ("STAT 2000") at Utah State University. The course used the electronic textbook CyberStats in addition to "face-to-face" teaching. This paper gives insight in our experiences in teaching this…
Absolute detector calibration using twin beams.
Peřina, Jan; Haderka, Ondřej; Michálek, Václav; Hamar, Martin
2012-07-01
A method for the determination of absolute quantum detection efficiency is suggested based on the measurement of photocount statistics of twin beams. The measured histograms of joint signal-idler photocount statistics allow us to eliminate an additional noise superimposed on an ideal calibration field composed of only photon pairs. This makes the method superior above other approaches presently used. Twin beams are described using a paired variant of quantum superposition of signal and noise.
Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David
2015-01-01
New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.
2010-01-01
Background Animals, including humans, exhibit a variety of biological rhythms. This article describes a method for the detection and simultaneous comparison of multiple nycthemeral rhythms. Methods A statistical method for detecting periodic patterns in time-related data via harmonic regression is described. The method is particularly capable of detecting nycthemeral rhythms in medical data. Additionally a method for simultaneously comparing two or more periodic patterns is described, which derives from the analysis of variance (ANOVA). This method statistically confirms or rejects equality of periodic patterns. Mathematical descriptions of the detecting method and the comparing method are displayed. Results Nycthemeral rhythms of incidents of bodily harm in Middle Franconia are analyzed in order to demonstrate both methods. Every day of the week showed a significant nycthemeral rhythm of bodily harm. These seven patterns of the week were compared to each other revealing only two different nycthemeral rhythms, one for Friday and Saturday and one for the other weekdays. PMID:21059197
Lotfy, Hayam Mahmoud; Salem, Hesham; Abdelkawy, Mohammad; Samir, Ahmed
2015-04-05
Five spectrophotometric methods were successfully developed and validated for the determination of betamethasone valerate and fusidic acid in their binary mixture. Those methods are isoabsorptive point method combined with the first derivative (ISO Point--D1) and the recently developed and well established methods namely ratio difference (RD) and constant center coupled with spectrum subtraction (CC) methods, in addition to derivative ratio (1DD) and mean centering of ratio spectra (MCR). New enrichment technique called spectrum addition technique was used instead of traditional spiking technique. The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of official methods. The statistical comparison showed that there is no significant difference between the proposed methods and the official ones regarding both accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.
Vomweg, T W; Buscema, M; Kauczor, H U; Teifke, A; Intraligi, M; Terzi, S; Heussel, C P; Achenbach, T; Rieker, O; Mayer, D; Thelen, M
2003-09-01
The aim of this study was to evaluate the capability of improved artificial neural networks (ANN) and additional novel training methods in distinguishing between benign and malignant breast lesions in contrast-enhanced magnetic resonance-mammography (MRM). A total of 604 histologically proven cases of contrast-enhanced lesions of the female breast at MRI were analyzed. Morphological, dynamic and clinical parameters were collected and stored in a database. The data set was divided into several groups using random or experimental methods [Training & Testing (T&T) algorithm] to train and test different ANNs. An additional novel computer program for input variable selection was applied. Sensitivity and specificity were calculated and compared with a statistical method and an expert radiologist. After optimization of the distribution of cases among the training and testing sets by the T & T algorithm and the reduction of input variables by the Input Selection procedure a highly sophisticated ANN achieved a sensitivity of 93.6% and a specificity of 91.9% in predicting malignancy of lesions within an independent prediction sample set. The best statistical method reached a sensitivity of 90.5% and a specificity of 68.9%. An expert radiologist performed better than the statistical method but worse than the ANN (sensitivity 92.1%, specificity 85.6%). Features extracted out of dynamic contrast-enhanced MRM and additional clinical data can be successfully analyzed by advanced ANNs. The quality of the resulting network strongly depends on the training methods, which are improved by the use of novel training tools. The best results of an improved ANN outperform expert radiologists.
2005-04-01
the radiography gauging. In addition to the Statistical Energy Analysis (SEA) measurement a small exciter table (BK4810) and impedance head (BK 8000... Statistical Energy Analysis ; 7th Conf. on Vehicle System Dynamics, Identification and Anomalies (VSDIA2000), 6-8 Nov. 2000 Budapest, Proc. pp. 491-493... Energy Analysis (SEA) and Ultrasound Test. (UT) were concurrently applied. These methods collect accessory information on the objects under inspection
McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad
2011-07-01
The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates. Ophthalmic & Physiological Optics © 2011 The College of Optometrists.
A new statistical approach to climate change detection and attribution
NASA Astrophysics Data System (ADS)
Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe
2017-01-01
We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).
Modified gastroduodenostomy in laparoscopy-assisted distal gastrectomy: a 'tornado' anastomosis.
Kubota, Keisuke; Kuroda, Junko; Yoshida, Masashi; Okada, Akihiro; Nitori, Nobuhiro; Kitajima, Masaki
2013-01-01
This study was to examine the utility of a modified double-stapling end-to-end gastroduodenostomy method ('Tornado' anastomosis) compared to a method with an additional gastrotomy ('Anterior Incision' method) in laparoscopy-assisted distal gastrectomy. Forty-two patients with gastric cancer who underwent laparoscopy-assisted distal gastrectomy were analyzed retrospectively. Billroth-I using an additional gastrotomy was performed in 24 patients (AI group) and Billroth-I without an additional gastrotomy was performed in 18 (TOR group). Clinicopathological features, operative outcomes (lymph node dissection, operative time, operative blood loss) and postoperative outcomes (complications, postoperative hospital stay, and body weight loss at one year after surgery) were evaluated and compared between groups. Operative time was significantly shorter in the TOR group (251 min) than in the AI group (282 min) (p < 0.01). There were no statistically significant differences in operative blood loss, postoperative complications, and hospital stay between the 2 study groups. Body weight loss at one year after surgery was -5.8 kg in the TOR group and -6.5 kg in the AI group, without a statistically significant difference. Completion time for Billroth-I anastomosis was significantly shorter with Tornado anastomosis than with the Anterior Incision method, with safety equal between the two methods.
Improved score statistics for meta-analysis in single-variant and gene-level association studies.
Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo
2018-06-01
Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.
Goyal, Ravi; De Gruttola, Victor
2018-01-30
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel
2017-07-01
Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.
New U.S. Geological Survey Method for the Assessment of Reserve Growth
Klett, Timothy R.; Attanasi, E.D.; Charpentier, Ronald R.; Cook, Troy A.; Freeman, P.A.; Gautier, Donald L.; Le, Phuong A.; Ryder, Robert T.; Schenk, Christopher J.; Tennyson, Marilyn E.; Verma, Mahendra K.
2011-01-01
Reserve growth is defined as the estimated increases in quantities of crude oil, natural gas, and natural gas liquids that have the potential to be added to remaining reserves in discovered accumulations through extension, revision, improved recovery efficiency, and additions of new pools or reservoirs. A new U.S. Geological Survey method was developed to assess the reserve-growth potential of technically recoverable crude oil and natural gas to be added to reserves under proven technology currently in practice within the trend or play, or which reasonably can be extrapolated from geologically similar trends or plays. This method currently is in use to assess potential additions to reserves in discovered fields of the United States. The new approach involves (1) individual analysis of selected large accumulations that contribute most to reserve growth, and (2) conventional statistical modeling of reserve growth in remaining accumulations. This report will focus on the individual accumulation analysis. In the past, the U.S. Geological Survey estimated reserve growth by statistical methods using historical recoverable-quantity data. Those statistical methods were based on growth rates averaged by the number of years since accumulation discovery. Accumulations in mature petroleum provinces with volumetrically significant reserve growth, however, bias statistical models of the data; therefore, accumulations with significant reserve growth are best analyzed separately from those with less significant reserve growth. Large (greater than 500 million barrels) and older (with respect to year of discovery) oil accumulations increase in size at greater rates late in their development history in contrast to more recently discovered accumulations that achieve most growth early in their development history. Such differences greatly affect the statistical methods commonly used to forecast reserve growth. The individual accumulation-analysis method involves estimating the in-place petroleum quantity and its uncertainty, as well as the estimated (forecasted) recoverability and its respective uncertainty. These variables are assigned probabilistic distributions and are combined statistically to provide probabilistic estimates of ultimate recoverable quantities. Cumulative production and remaining reserves are then subtracted from the estimated ultimate recoverable quantities to provide potential reserve growth. In practice, results of the two methods are aggregated to various scales, the highest of which includes an entire country or the world total. The aggregated results are reported along with the statistically appropriate uncertainties.
Sabourin, Jeremy; Nobel, Andrew B.; Valdar, William
2014-01-01
Genomewide association studies sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple SNPs simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow-up studies. Current multi-SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA-dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights; it estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing a SNP prioritization that best identifies underlying true signals, we show that: our method easily outperforms a single marker analysis; when additive-only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive-only effects; and, when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation. PMID:25417853
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
The Content of Statistical Requirements for Authors in Biomedical Research Journals.
Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang
2016-10-20
Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues' serious considerations not only at the stage of data analysis but also at the stage of methodological design. Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including "address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation," and "statistical methods and the reasons." Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible.
Online Statistical Modeling (Regression Analysis) for Independent Responses
NASA Astrophysics Data System (ADS)
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
New Developments in the Embedded Statistical Coupling Method: Atomistic/Continuum Crack Propagation
NASA Technical Reports Server (NTRS)
Saether, E.; Yamakov, V.; Glaessgen, E.
2008-01-01
A concurrent multiscale modeling methodology that embeds a molecular dynamics (MD) region within a finite element (FEM) domain has been enhanced. The concurrent MD-FEM coupling methodology uses statistical averaging of the deformation of the atomistic MD domain to provide interface displacement boundary conditions to the surrounding continuum FEM region, which, in turn, generates interface reaction forces that are applied as piecewise constant traction boundary conditions to the MD domain. The enhancement is based on the addition of molecular dynamics-based cohesive zone model (CZM) elements near the MD-FEM interface. The CZM elements are a continuum interpretation of the traction-displacement relationships taken from MD simulations using Cohesive Zone Volume Elements (CZVE). The addition of CZM elements to the concurrent MD-FEM analysis provides a consistent set of atomistically-based cohesive properties within the finite element region near the growing crack. Another set of CZVEs are then used to extract revised CZM relationships from the enhanced embedded statistical coupling method (ESCM) simulation of an edge crack under uniaxial loading.
Evaluation of airborne lidar data to predict vegetation Presence/Absence
Palaseanu-Lovejoy, M.; Nayegandhi, A.; Brock, J.; Woodman, R.; Wright, C.W.
2009-01-01
This study evaluates the capabilities of the Experimental Advanced Airborne Research Lidar (EAARL) in delineating vegetation assemblages in Jean Lafitte National Park, Louisiana. Five-meter-resolution grids of bare earth, canopy height, canopy-reflection ratio, and height of median energy were derived from EAARL data acquired in September 2006. Ground-truth data were collected along transects to assess species composition, canopy cover, and ground cover. To decide which model is more accurate, comparisons of general linear models and generalized additive models were conducted using conventional evaluation methods (i.e., sensitivity, specificity, Kappa statistics, and area under the curve) and two new indexes, net reclassification improvement and integrated discrimination improvement. Generalized additive models were superior to general linear models in modeling presence/absence in training vegetation categories, but no statistically significant differences between the two models were achieved in determining the classification accuracy at validation locations using conventional evaluation methods, although statistically significant improvements in net reclassifications were observed. ?? 2009 Coastal Education and Research Foundation.
NASA Astrophysics Data System (ADS)
Abdellatef, Hisham E.
2007-04-01
Picric acid, bromocresol green, bromothymol blue, cobalt thiocyanate and molybdenum(V) thiocyanate have been tested as spectrophotometric reagents for the determination of disopyramide and irbesartan. Reaction conditions have been optimized to obtain coloured comoplexes of higher sensitivity and longer stability. The absorbance of ion-pair complexes formed were found to increases linearity with increases in concentrations of disopyramide and irbesartan which were corroborated by correction coefficient values. The developed methods have been successfully applied for the determination of disopyramide and irbesartan in bulk drugs and pharmaceutical formulations. The common excipients and additives did not interfere in their determination. The results obtained by the proposed methods have been statistically compared by means of student t-test and by the variance ratio F-test. The validity was assessed by applying the standard addition technique. The results were compared statistically with the official or reference methods showing a good agreement with high precision and accuracy.
Robbins, L G
2000-01-01
Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml. PMID:10628965
On sufficient statistics of least-squares superposition of vector sets.
Konagurthu, Arun S; Kasarapu, Parthan; Allison, Lloyd; Collier, James H; Lesk, Arthur M
2015-06-01
The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.
NASA Astrophysics Data System (ADS)
Rubel, Aleksey S.; Lukin, Vladimir V.; Egiazarian, Karen O.
2015-03-01
Results of denoising based on discrete cosine transform for a wide class of images corrupted by additive noise are obtained. Three types of noise are analyzed: additive white Gaussian noise and additive spatially correlated Gaussian noise with middle and high correlation levels. TID2013 image database and some additional images are taken as test images. Conventional DCT filter and BM3D are used as denoising techniques. Denoising efficiency is described by PSNR and PSNR-HVS-M metrics. Within hard-thresholding denoising mechanism, DCT-spectrum coefficient statistics are used to characterize images and, subsequently, denoising efficiency for them. Results of denoising efficiency are fitted for such statistics and efficient approximations are obtained. It is shown that the obtained approximations provide high accuracy of prediction of denoising efficiency.
Cancer survival: an overview of measures, uses, and interpretation.
Mariotto, Angela B; Noone, Anne-Michelle; Howlader, Nadia; Cho, Hyunsoon; Keel, Gretchen E; Garshell, Jessica; Woloshin, Steven; Schwartz, Lisa M
2014-11-01
Survival statistics are of great interest to patients, clinicians, researchers, and policy makers. Although seemingly simple, survival can be confusing: there are many different survival measures with a plethora of names and statistical methods developed to answer different questions. This paper aims to describe and disseminate different survival measures and their interpretation in less technical language. In addition, we introduce templates to summarize cancer survival statistic organized by their specific purpose: research and policy versus prognosis and clinical decision making. Published by Oxford University Press 2014.
Cancer Survival: An Overview of Measures, Uses, and Interpretation
Noone, Anne-Michelle; Howlader, Nadia; Cho, Hyunsoon; Keel, Gretchen E.; Garshell, Jessica; Woloshin, Steven; Schwartz, Lisa M.
2014-01-01
Survival statistics are of great interest to patients, clinicians, researchers, and policy makers. Although seemingly simple, survival can be confusing: there are many different survival measures with a plethora of names and statistical methods developed to answer different questions. This paper aims to describe and disseminate different survival measures and their interpretation in less technical language. In addition, we introduce templates to summarize cancer survival statistic organized by their specific purpose: research and policy versus prognosis and clinical decision making. PMID:25417231
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Quantifying Safety Margin Using the Risk-Informed Safety Margin Characterization (RISMC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grabaskas, David; Bucknor, Matthew; Brunett, Acacia
2015-04-26
The Risk-Informed Safety Margin Characterization (RISMC), developed by Idaho National Laboratory as part of the Light-Water Reactor Sustainability Project, utilizes a probabilistic safety margin comparison between a load and capacity distribution, rather than a deterministic comparison between two values, as is usually done in best-estimate plus uncertainty analyses. The goal is to determine the failure probability, or in other words, the probability of the system load equaling or exceeding the system capacity. While this method has been used in pilot studies, there has been little work conducted investigating the statistical significance of the resulting failure probability. In particular, it ismore » difficult to determine how many simulations are necessary to properly characterize the failure probability. This work uses classical (frequentist) statistics and confidence intervals to examine the impact in statistical accuracy when the number of simulations is varied. Two methods are proposed to establish confidence intervals related to the failure probability established using a RISMC analysis. The confidence interval provides information about the statistical accuracy of the method utilized to explore the uncertainty space, and offers a quantitative method to gauge the increase in statistical accuracy due to performing additional simulations.« less
Pasaniuc, Bogdan; Zaitlen, Noah; Lettre, Guillaume; Chen, Gary K; Tandon, Arti; Kao, W H Linda; Ruczinski, Ingo; Fornage, Myriam; Siscovick, David S; Zhu, Xiaofeng; Larkin, Emma; Lange, Leslie A; Cupples, L Adrienne; Yang, Qiong; Akylbekova, Ermeg L; Musani, Solomon K; Divers, Jasmin; Mychaleckyj, Joe; Li, Mingyao; Papanicolaou, George J; Millikan, Robert C; Ambrosone, Christine B; John, Esther M; Bernstein, Leslie; Zheng, Wei; Hu, Jennifer J; Ziegler, Regina G; Nyante, Sarah J; Bandera, Elisa V; Ingles, Sue A; Press, Michael F; Chanock, Stephen J; Deming, Sandra L; Rodriguez-Gil, Jorge L; Palmer, Cameron D; Buxbaum, Sarah; Ekunwe, Lynette; Hirschhorn, Joel N; Henderson, Brian E; Myers, Simon; Haiman, Christopher A; Reich, David; Patterson, Nick; Wilson, James G; Price, Alkes L
2011-04-01
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.
Statistical methods to detect novel genetic variants using publicly available GWAS summary data.
Guo, Bin; Wu, Baolin
2018-03-01
We propose statistical methods to detect novel genetic variants using only genome-wide association studies (GWAS) summary data without access to raw genotype and phenotype data. With more and more summary data being posted for public access in the post GWAS era, the proposed methods are practically very useful to identify additional interesting genetic variants and shed lights on the underlying disease mechanism. We illustrate the utility of our proposed methods with application to GWAS meta-analysis results of fasting glucose from the international MAGIC consortium. We found several novel genome-wide significant loci that are worth further study. Copyright © 2018 Elsevier Ltd. All rights reserved.
Comparison of the predictive validity of diagnosis-based risk adjusters for clinical outcomes.
Petersen, Laura A; Pietz, Kenneth; Woodard, LeChauncy D; Byrne, Margaret
2005-01-01
Many possible methods of risk adjustment exist, but there is a dearth of comparative data on their performance. We compared the predictive validity of 2 widely used methods (Diagnostic Cost Groups [DCGs] and Adjusted Clinical Groups [ACGs]) for 2 clinical outcomes using a large national sample of patients. We studied all patients who used Veterans Health Administration (VA) medical services in fiscal year (FY) 2001 (n = 3,069,168) and assigned both a DCG and an ACG to each. We used logistic regression analyses to compare predictive ability for death or long-term care (LTC) hospitalization for age/gender models, DCG models, and ACG models. We also assessed the effect of adding age to the DCG and ACG models. Patients in the highest DCG categories, indicating higher severity of illness, were more likely to die or to require LTC hospitalization. Surprisingly, the age/gender model predicted death slightly more accurately than the ACG model (c-statistic of 0.710 versus 0.700, respectively). The addition of age to the ACG model improved the c-statistic to 0.768. The highest c-statistic for prediction of death was obtained with a DCG/age model (0.830). The lowest c-statistics were obtained for age/gender models for LTC hospitalization (c-statistic 0.593). The c-statistic for use of ACGs to predict LTC hospitalization was 0.783, and improved to 0.792 with the addition of age. The c-statistics for use of DCGs and DCG/age to predict LTC hospitalization were 0.885 and 0.890, respectively, indicating the best prediction. We found that risk adjusters based upon diagnoses predicted an increased likelihood of death or LTC hospitalization, exhibiting good predictive validity. In this comparative analysis using VA data, DCG models were generally superior to ACG models in predicting clinical outcomes, although ACG model performance was enhanced by the addition of age.
Exercise reduces depressive symptoms in adults with arthritis: Evidential value
Kelley, George A; Kelley, Kristi S
2016-01-01
AIM To determine whether evidential value exists that exercise reduces depression in adults with arthritis and other rheumatic conditions. METHODS Utilizing data derived from a prior meta-analysis of 29 randomized controlled trials comprising 2449 participants (1470 exercise, 979 control) with fibromyalgia, osteoarthritis, rheumatoid arthritis or systemic lupus erythematosus, a new method, P-curve, was utilized to assess for evidentiary worth as well as dismiss the possibility of discriminating reporting of statistically significant results regarding exercise and depression in adults with arthritis and other rheumatic conditions. Using the method of Stouffer, Z-scores were calculated to examine selective-reporting bias. An alpha (P) value < 0.05 was deemed statistically significant. In addition, average power of the tests included in P-curve, adjusted for publication bias, was calculated. RESULTS Fifteen of 29 studies (51.7%) with exercise and depression results were statistically significant (P < 0.05) while none of the results were statistically significant with respect to exercise increasing depression in adults with arthritis and other rheumatic conditions. Right-skew to dismiss selective reporting was identified (Z = −5.28, P < 0.0001). In addition, the included studies did not lack evidential value (Z = 2.39, P = 0.99), nor did they lack evidential value and were P-hacked (Z = 5.28, P > 0.99). The relative frequencies of P-values were 66.7% at 0.01, 6.7% each at 0.02 and 0.03, 13.3% at 0.04 and 6.7% at 0.05. The average power of the tests included in P-curve, corrected for publication bias, was 69%. Diagnostic plot results revealed that the observed power estimate was a better fit than the alternatives. CONCLUSION Evidential value results provide additional support that exercise reduces depression in adults with arthritis and other rheumatic conditions. PMID:27489782
Fixed-ratio ray designs have been used for detecting and characterizing interactions of large numbers of chemicals in combination. Single chemical dose-response data are used to predict an “additivity curve” along an environmentally relevant ray. A “mixture curve” is estimated fr...
NASA Astrophysics Data System (ADS)
Wang, Dong
2016-03-01
Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
Bonn, Bernadine A.
2008-01-01
A long-term method detection level (LT-MDL) and laboratory reporting level (LRL) are used by the U.S. Geological Survey?s National Water Quality Laboratory (NWQL) when reporting results from most chemical analyses of water samples. Changing to this method provided data users with additional information about their data and often resulted in more reported values in the low concentration range. Before this method was implemented, many of these values would have been censored. The use of the LT-MDL and LRL presents some challenges for the data user. Interpreting data in the low concentration range increases the need for adequate quality assurance because even small contamination or recovery problems can be relatively large compared to concentrations near the LT-MDL and LRL. In addition, the definition of the LT-MDL, as well as the inclusion of low values, can result in complex data sets with multiple censoring levels and reported values that are less than a censoring level. Improper interpretation or statistical manipulation of low-range results in these data sets can result in bias and incorrect conclusions. This document is designed to help data users use and interpret data reported with the LTMDL/ LRL method. The calculation and application of the LT-MDL and LRL are described. This document shows how to extract statistical information from the LT-MDL and LRL and how to use that information in USGS investigations, such as assessing the quality of field data, interpreting field data, and planning data collection for new projects. A set of 19 detailed examples are included in this document to help data users think about their data and properly interpret lowrange data without introducing bias. Although this document is not meant to be a comprehensive resource of statistical methods, several useful methods of analyzing censored data are demonstrated, including Regression on Order Statistics and Kaplan-Meier Estimation. These two statistical methods handle complex censored data sets without resorting to substitution, thereby avoiding a common source of bias and inaccuracy.
NASA Technical Reports Server (NTRS)
Crabill, Norman L.
1989-01-01
Two hundred hours of Lockheed L 1011 digital flight data recorder data taken in 1973 were used to develop methods and procedures for obtaining statistical data useful for updating airliner airworthiness design criteria. Five thousand hours of additional data taken in 1978 to 1982 are reported in volumes 2, 3, 4 and 5.
Bayesian approach to inverse statistical mechanics.
Habeck, Michael
2014-05-01
Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.
Bayesian approach to inverse statistical mechanics
NASA Astrophysics Data System (ADS)
Habeck, Michael
2014-05-01
Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.
Integrated Modeling of Themes, Targeting Claims and Networks in Insurgent Rhetoric
2016-06-09
axis range. In addition , the correlation r and p- value p are shown. The first plot below is for the LIB issue which aligns with Network Dimension 1...set to one). Additional constraints on the value are made by the various methods. λ has dimensions N×N. If the method yields an issue-dependent LOA...existence of a statistically significant correlation between the components of the first eigenvector 1u and the node variable values ix would provide
Combined proportional and additive residual error models in population pharmacokinetic modelling.
Proost, Johannes H
2017-11-15
In pharmacokinetic modelling, a combined proportional and additive residual error model is often preferred over a proportional or additive residual error model. Different approaches have been proposed, but a comparison between approaches is still lacking. The theoretical background of the methods is described. Method VAR assumes that the variance of the residual error is the sum of the statistically independent proportional and additive components; this method can be coded in three ways. Method SD assumes that the standard deviation of the residual error is the sum of the proportional and additive components. Using datasets from literature and simulations based on these datasets, the methods are compared using NONMEM. The different coding of methods VAR yield identical results. Using method SD, the values of the parameters describing residual error are lower than for method VAR, but the values of the structural parameters and their inter-individual variability are hardly affected by the choice of the method. Both methods are valid approaches in combined proportional and additive residual error modelling, and selection may be based on OFV. When the result of an analysis is used for simulation purposes, it is essential that the simulation tool uses the same method as used during analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Davalos, Angel D; Luben, Thomas J; Herring, Amy H; Sacks, Jason D
2017-02-01
Air pollution epidemiology traditionally focuses on the relationship between individual air pollutants and health outcomes (e.g., mortality). To account for potential copollutant confounding, individual pollutant associations are often estimated by adjusting or controlling for other pollutants in the mixture. Recently, the need to characterize the relationship between health outcomes and the larger multipollutant mixture has been emphasized in an attempt to better protect public health and inform more sustainable air quality management decisions. New and innovative statistical methods to examine multipollutant exposures were identified through a broad literature search, with a specific focus on those statistical approaches currently used in epidemiologic studies of short-term exposures to criteria air pollutants (i.e., particulate matter, carbon monoxide, sulfur dioxide, nitrogen dioxide, and ozone). Five broad classes of statistical approaches were identified for examining associations between short-term multipollutant exposures and health outcomes, specifically additive main effects, effect measure modification, unsupervised dimension reduction, supervised dimension reduction, and nonparametric methods. These approaches are characterized including advantages and limitations in different epidemiologic scenarios. By highlighting the characteristics of various studies in which multipollutant statistical methods have been used, this review provides epidemiologists and biostatisticians with a resource to aid in the selection of the most optimal statistical method to use when examining multipollutant exposures. Published by Elsevier Inc.
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
Statistical plant set estimation using Schroeder-phased multisinusoidal input design
NASA Technical Reports Server (NTRS)
Bayard, D. S.
1992-01-01
A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.
Dalton Highway : characterization of foundation soils
DOT National Transportation Integrated Search
1984-09-01
In this report we represent the results of our geotechnical characterization of natural foundation soils along the Dalton Highway from the Livengood to Prudhoe Bay. In addition, it analyzes this data by statistical methods and caracterizes foundation...
Wave propagation in a random medium
NASA Technical Reports Server (NTRS)
Lee, R. W.; Harp, J. C.
1969-01-01
A simple technique is used to derive statistical characterizations of the perturbations imposed upon a wave (plane, spherical or beamed) propagating through a random medium. The method is essentially physical rather than mathematical, and is probably equivalent to the Rytov method. The limitations of the method are discussed in some detail; in general they are restrictive only for optical paths longer than a few hundred meters, and for paths at the lower microwave frequencies. Situations treated include arbitrary path geometries, finite transmitting and receiving apertures, and anisotropic media. Results include, in addition to the usual statistical quantities, time-lagged functions, mixed functions involving amplitude and phase fluctuations, angle-of-arrival covariances, frequency covariances, and other higher-order quantities.
Evidence-Based Medicine as a Tool for Undergraduate Probability and Statistics Education
Masel, J.; Humphrey, P. T.; Blackburn, B.; Levine, J. A.
2015-01-01
Most students have difficulty reasoning about chance events, and misconceptions regarding probability can persist or even strengthen following traditional instruction. Many biostatistics classes sidestep this problem by prioritizing exploratory data analysis over probability. However, probability itself, in addition to statistics, is essential both to the biology curriculum and to informed decision making in daily life. One area in which probability is particularly important is medicine. Given the preponderance of pre health students, in addition to more general interest in medicine, we capitalized on students’ intrinsic motivation in this area to teach both probability and statistics. We use the randomized controlled trial as the centerpiece of the course, because it exemplifies the most salient features of the scientific method, and the application of critical thinking to medicine. The other two pillars of the course are biomedical applications of Bayes’ theorem and science and society content. Backward design from these three overarching aims was used to select appropriate probability and statistics content, with a focus on eliciting and countering previously documented misconceptions in their medical context. Pretest/posttest assessments using the Quantitative Reasoning Quotient and Attitudes Toward Statistics instruments are positive, bucking several negative trends previously reported in statistics education. PMID:26582236
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Pitfalls in statistical landslide susceptibility modelling
NASA Astrophysics Data System (ADS)
Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut
2010-05-01
The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.
Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.
2017-01-01
Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.
Angelovičová, Mária; Kunová, Simona; Čapla, Jozef; Zajac, Peter; Bučko, Ondřej; Angelovič, Marek
2016-01-01
The purpose of this study was an experimental investigation and a statistical evaluation of the influence of various additives in feed mixtures of broiler chickens on fatty acids content and their ratio in breast and thigh muscles. First feed additive consisted of narasin, nicarbasin and salinomycin sodium, and other five additives were of phytogenic origin. In vivo experiment was realized on the poultry experimental station with deep litter breeding system. A total of 300 one-day-old hybrid chickens Cobb 500 divided into six groups were used for the experiment. The experimental period was divided into four phases, i.e. Starter, Grower 1, Grower 2 and Final, according to the application of commercial feed mixture of soy cereal type. Additive substances used in feed mixtures were different for each group. Basic feed mixtures were equal for all groups. Fatty acid profile of breast and thigh muscles was measured by the method of FT IR Nicolet 6700. Investigated additive substances in the feed mixtures did not have statistically significant effect on fatty acid content and omega-6/omega-3 polyunsaturated fatty acid (PUFA) ratio in breast and thigh muscles. Strong statistically significant relation between omega-6 PUFAs and total PUFAs were proved by experiment. A relation between omega-3 PUFAs and total PUFAs was found only in the group with Biocitro additive.
Conceptual and statistical problems associated with the use of diversity indices in ecology.
Barrantes, Gilbert; Sandoval, Luis
2009-09-01
Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.
Hart, Corey B.; Giszter, Simon F.
2013-01-01
We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341
An instrument to assess the statistical intensity of medical research papers.
Nieminen, Pentti; Virtanen, Jorma I; Vähänikkilä, Hannu
2017-01-01
There is widespread evidence that statistical methods play an important role in original research articles, especially in medical research. The evaluation of statistical methods and reporting in journals suffers from a lack of standardized methods for assessing the use of statistics. The objective of this study was to develop and evaluate an instrument to assess the statistical intensity in research articles in a standardized way. A checklist-type measure scale was developed by selecting and refining items from previous reports about the statistical contents of medical journal articles and from published guidelines for statistical reporting. A total of 840 original medical research articles that were published between 2007-2015 in 16 journals were evaluated to test the scoring instrument. The total sum of all items was used to assess the intensity between sub-fields and journals. Inter-rater agreement was examined using a random sample of 40 articles. Four raters read and evaluated the selected articles using the developed instrument. The scale consisted of 66 items. The total summary score adequately discriminated between research articles according to their study design characteristics. The new instrument could also discriminate between journals according to their statistical intensity. The inter-observer agreement measured by the ICC was 0.88 between all four raters. Individual item analysis showed very high agreement between the rater pairs, the percentage agreement ranged from 91.7% to 95.2%. A reliable and applicable instrument for evaluating the statistical intensity in research papers was developed. It is a helpful tool for comparing the statistical intensity between sub-fields and journals. The novel instrument may be applied in manuscript peer review to identify papers in need of additional statistical review.
A Bayesian approach to the statistical analysis of device preference studies.
Fu, Haoda; Qu, Yongming; Zhu, Baojin; Huster, William
2012-01-01
Drug delivery devices are required to have excellent technical specifications to deliver drugs accurately, and in addition, the devices should provide a satisfactory experience to patients because this can have a direct effect on drug compliance. To compare patients' experience with two devices, cross-over studies with patient-reported outcomes (PRO) as response variables are often used. Because of the strength of cross-over designs, each subject can directly compare the two devices by using the PRO variables, and variables indicating preference (preferring A, preferring B, or no preference) can be easily derived. Traditionally, methods based on frequentist statistics can be used to analyze such preference data, but there are some limitations for the frequentist methods. Recently, Bayesian methods are considered an acceptable method by the US Food and Drug Administration to design and analyze device studies. In this paper, we propose a Bayesian statistical method to analyze the data from preference trials. We demonstrate that the new Bayesian estimator enjoys some optimal properties versus the frequentist estimator. Copyright © 2012 John Wiley & Sons, Ltd.
Study on Hybrid Image Search Technology Based on Texts and Contents
NASA Astrophysics Data System (ADS)
Wang, H. T.; Ma, F. L.; Yan, C.; Pan, H.
2018-05-01
Image search was studied first here based on texts and contents, respectively. The text-based image feature extraction was put forward by integrating the statistical and topic features in view of the limitation of extraction of keywords only by means of statistical features of words. On the other hand, a search-by-image method was put forward based on multi-feature fusion in view of the imprecision of the content-based image search by means of a single feature. The layered-searching method depended on primarily the text-based image search method and additionally the content-based image search was then put forward in view of differences between the text-based and content-based methods and their difficult direct fusion. The feasibility and effectiveness of the hybrid search algorithm were experimentally verified.
Gene flow analysis method, the D-statistic, is robust in a wide parameter space.
Zheng, Yichen; Janke, Axel
2018-01-08
We evaluated the sensitivity of the D-statistic, a parsimony-like method widely used to detect gene flow between closely related species. This method has been applied to a variety of taxa with a wide range of divergence times. However, its parameter space and thus its applicability to a wide taxonomic range has not been systematically studied. Divergence time, population size, time of gene flow, distance of outgroup and number of loci were examined in a sensitivity analysis. The sensitivity study shows that the primary determinant of the D-statistic is the relative population size, i.e. the population size scaled by the number of generations since divergence. This is consistent with the fact that the main confounding factor in gene flow detection is incomplete lineage sorting by diluting the signal. The sensitivity of the D-statistic is also affected by the direction of gene flow, size and number of loci. In addition, we examined the ability of the f-statistics, [Formula: see text] and [Formula: see text], to estimate the fraction of a genome affected by gene flow; while these statistics are difficult to implement to practical questions in biology due to lack of knowledge of when the gene flow happened, they can be used to compare datasets with identical or similar demographic background. The D-statistic, as a method to detect gene flow, is robust against a wide range of genetic distances (divergence times) but it is sensitive to population size. The D-statistic should only be applied with critical reservation to taxa where population sizes are large relative to branch lengths in generations.
Markovic, Gabriela; Schult, Marie-Louise; Bartfai, Aniko; Elg, Mattias
2017-01-31
Progress in early cognitive recovery after acquired brain injury is uneven and unpredictable, and thus the evaluation of rehabilitation is complex. The use of time-series measurements is susceptible to statistical change due to process variation. To evaluate the feasibility of using a time-series method, statistical process control, in early cognitive rehabilitation. Participants were 27 patients with acquired brain injury undergoing interdisciplinary rehabilitation of attention within 4 months post-injury. The outcome measure, the Paced Auditory Serial Addition Test, was analysed using statistical process control. Statistical process control identifies if and when change occurs in the process according to 3 patterns: rapid, steady or stationary performers. The statistical process control method was adjusted, in terms of constructing the baseline and the total number of measurement points, in order to measure a process in change. Statistical process control methodology is feasible for use in early cognitive rehabilitation, since it provides information about change in a process, thus enabling adjustment of the individual treatment response. Together with the results indicating discernible subgroups that respond differently to rehabilitation, statistical process control could be a valid tool in clinical decision-making. This study is a starting-point in understanding the rehabilitation process using a real-time-measurements approach.
Use of Statistical Analyses in the Ophthalmic Literature
Lisboa, Renato; Meira-Freitas, Daniel; Tatham, Andrew J.; Marvasti, Amir H.; Sharpsten, Lucie; Medeiros, Felipe A.
2014-01-01
Purpose To identify the most commonly used statistical analyses in the ophthalmic literature and to determine the likely gain in comprehension of the literature that readers could expect if they were to sequentially add knowledge of more advanced techniques to their statistical repertoire. Design Cross-sectional study Methods All articles published from January 2012 to December 2012 in Ophthalmology, American Journal of Ophthalmology and Archives of Ophthalmology were reviewed. A total of 780 peer-reviewed articles were included. Two reviewers examined each article and assigned categories to each one depending on the type of statistical analyses used. Discrepancies between reviewers were resolved by consensus. Main Outcome Measures Total number and percentage of articles containing each category of statistical analysis were obtained. Additionally we estimated the accumulated number and percentage of articles that a reader would be expected to be able to interpret depending on their statistical repertoire. Results Readers with little or no statistical knowledge would be expected to be able to interpret the statistical methods presented in only 20.8% of articles. In order to understand more than half (51.4%) of the articles published, readers were expected to be familiar with at least 15 different statistical methods. Knowledge of 21 categories of statistical methods was necessary to comprehend 70.9% of articles, while knowledge of more than 29 categories was necessary to comprehend more than 90% of articles. Articles in retina and glaucoma subspecialties showed a tendency for using more complex analysis when compared to cornea. Conclusions Readers of clinical journals in ophthalmology need to have substantial knowledge of statistical methodology to understand the results of published studies in the literature. The frequency of use of complex statistical analyses also indicates that those involved in the editorial peer-review process must have sound statistical knowledge in order to critically appraise articles submitted for publication. The results of this study could provide guidance to direct the statistical learning of clinical ophthalmologists, researchers and educators involved in the design of courses for residents and medical students. PMID:24612977
Optical Parametric Amplification of Single Photon: Statistical Properties and Quantum Interference
NASA Astrophysics Data System (ADS)
Xu, Xue-Xiang; Yuan, Hong-Chun
2014-05-01
By using phase space method, we theoretically investigate the quantum statistical properties and quantum interference of optical parametric amplification of single photon. The statistical properties, such as the Wigner function (WF), average photon number, photon number distribution and parity, are derived analytically for the fields of the two output ports. The results indicate that the fields in the output ports are multiphoton states rather than single photon state due to the amplification of the optical parametric amplifiers (OPA). In addition, the phase sensitivity is also examined by using the detection scheme of parity measurement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzke, Brett D.; Wilson, John E.; Hathaway, J.
2008-02-12
Statistically defensible methods are presented for developing geophysical detector sampling plans and analyzing data for munitions response sites where unexploded ordnance (UXO) may exist. Detection methods for identifying areas of elevated anomaly density from background density are shown. Additionally, methods are described which aid in the choice of transect pattern and spacing to assure with degree of confidence that a target area (TA) of specific size, shape, and anomaly density will be identified using the detection methods. Methods for evaluating the sensitivity of designs to variation in certain parameters are also discussed. Methods presented have been incorporated into the Visualmore » Sample Plan (VSP) software (free at http://dqo.pnl.gov/vsp) and demonstrated at multiple sites in the United States. Application examples from actual transect designs and surveys from the previous two years are demonstrated.« less
New output improvements for CLASSY
NASA Technical Reports Server (NTRS)
Rassbach, M. E. (Principal Investigator)
1981-01-01
Additional output data and formats for the CLASSY clustering algorithm were developed. Four such aids to the CLASSY user are described. These are: (1) statistical measures; (2) special map types; (3) formats for standard output; and (4) special cluster display method.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
NASA Astrophysics Data System (ADS)
Ishida, Shigeki; Mori, Atsuo; Shinji, Masato
The main method to reduce the blasting charge noise which occurs in a tunnel under construction is to install the sound insulation door in the tunnel. However, the numerical analysis technique to predict the accurate effect of the transmission loss in the sound insulation door is not established. In this study, we measured the blasting charge noise and the vibration of the sound insulation door in the tunnel with the blasting charge, and performed analysis and modified acoustic feature. In addition, we reproduced the noise reduction effect of the sound insulation door by statistical energy analysis method and confirmed that numerical simulation is possible by this procedure.
Method for factor analysis of GC/MS data
Van Benthem, Mark H; Kotula, Paul G; Keenan, Michael R
2012-09-11
The method of the present invention provides a fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by refinement with MCR-ALS to yield highly interpretable results.
Applications of Principled Search Methods in Climate Influences and Mechanisms
NASA Technical Reports Server (NTRS)
Glymour, Clark
2005-01-01
Forest and grass fires cause economic losses in the billions of dollars in the U.S. alone. In addition, boreal forests constitute a large carbon store; it has been estimated that, were no burning to occur, an additional 7 gigatons of carbon would be sequestered in boreal soils each century. Effective wildfire suppression requires anticipation of locales and times for which wildfire is most probable, preferably with a two to four week forecast, so that limited resources can be efficiently deployed. The United States Forest Service (USFS), and other experts and agencies have developed several measures of fire risk combining physical principles and expert judgment, and have used them in automated procedures for forecasting fire risk. Forecasting accuracies for some fire risk indices in combination with climate and other variables have been estimated for specific locations, with the value of fire risk index variables assessed by their statistical significance in regressions. In other cases, the MAPSS forecasts [23, 241 for example, forecasting accuracy has been estimated only by simulated data. We describe alternative forecasting methods that predict fire probability by locale and time using statistical or machine learning procedures trained on historical data, and we give comparative assessments of their forecasting accuracy for one fire season year, April- October, 2003, for all U.S. Forest Service lands. Aside from providing an accuracy baseline for other forecasting methods, the results illustrate the interdependence between the statistical significance of prediction variables and the forecasting method used.
Uncertainty Analysis of Seebeck Coefficient and Electrical Resistivity Characterization
NASA Technical Reports Server (NTRS)
Mackey, Jon; Sehirlioglu, Alp; Dynys, Fred
2014-01-01
In order to provide a complete description of a materials thermoelectric power factor, in addition to the measured nominal value, an uncertainty interval is required. The uncertainty may contain sources of measurement error including systematic bias error and precision error of a statistical nature. The work focuses specifically on the popular ZEM-3 (Ulvac Technologies) measurement system, but the methods apply to any measurement system. The analysis accounts for sources of systematic error including sample preparation tolerance, measurement probe placement, thermocouple cold-finger effect, and measurement parameters; in addition to including uncertainty of a statistical nature. Complete uncertainty analysis of a measurement system allows for more reliable comparison of measurement data between laboratories.
The choice of statistical methods for comparisons of dosimetric data in radiotherapy.
Chaikh, Abdulhamid; Giraud, Jean-Yves; Perrin, Emmanuel; Bresciani, Jean-Pierre; Balosso, Jacques
2014-09-18
Novel irradiation techniques are continuously introduced in radiotherapy to optimize the accuracy, the security and the clinical outcome of treatments. These changes could raise the question of discontinuity in dosimetric presentation and the subsequent need for practice adjustments in case of significant modifications. This study proposes a comprehensive approach to compare different techniques and tests whether their respective dose calculation algorithms give rise to statistically significant differences in the treatment doses for the patient. Statistical investigation principles are presented in the framework of a clinical example based on 62 fields of radiotherapy for lung cancer. The delivered doses in monitor units were calculated using three different dose calculation methods: the reference method accounts the dose without tissues density corrections using Pencil Beam Convolution (PBC) algorithm, whereas new methods calculate the dose with tissues density correction for 1D and 3D using Modified Batho (MB) method and Equivalent Tissue air ratio (ETAR) method, respectively. The normality of the data and the homogeneity of variance between groups were tested using Shapiro-Wilks and Levene test, respectively, then non-parametric statistical tests were performed. Specifically, the dose means estimated by the different calculation methods were compared using Friedman's test and Wilcoxon signed-rank test. In addition, the correlation between the doses calculated by the three methods was assessed using Spearman's rank and Kendall's rank tests. The Friedman's test showed a significant effect on the calculation method for the delivered dose of lung cancer patients (p <0.001). The density correction methods yielded to lower doses as compared to PBC by on average (-5 ± 4.4 SD) for MB and (-4.7 ± 5 SD) for ETAR. Post-hoc Wilcoxon signed-rank test of paired comparisons indicated that the delivered dose was significantly reduced using density-corrected methods as compared to the reference method. Spearman's and Kendall's rank tests indicated a positive correlation between the doses calculated with the different methods. This paper illustrates and justifies the use of statistical tests and graphical representations for dosimetric comparisons in radiotherapy. The statistical analysis shows the significance of dose differences resulting from two or more techniques in radiotherapy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhen, X; Chen, H; Liao, Y
Purpose: To study the feasibility of employing deformable registration methods for accurate rectum dose volume parameters calculation and their potentials in revealing rectum dose-toxicity between complication and non-complication cervical cancer patients with brachytherapy treatment. Method and Materials: Data from 60 patients treated with BT including planning images, treatment plans, and follow-up clinical exam were retrospectively collected. Among them, 12 patients complained about hematochezia were further examined with colonoscopy and scored as Grade 1–3 complication (CP). Meanwhile, another 12 non-complication (NCP) patients were selected as a reference group. To seek for potential gains in rectum toxicity prediction when fractional anatomical deformationsmore » are account for, the rectum dose volume parameters D0.1/1/2cc of the selected patients were retrospectively computed by three different approaches: the simple “worstcase scenario” (WS) addition method, an intensity-based deformable image registration (DIR) algorithm-Demons, and a more accurate, recent developed local topology preserved non-rigid point matching algorithm (TOP). Statistical significance of the differences between rectum doses of the CP group and the NCP group were tested by a two-tailed t-test and results were considered to be statistically significant if p < 0.05. Results: For the D0.1cc, no statistical differences are found between the CP and NCP group in all three methods. For the D1cc, dose difference is not detected by the WS method, however, statistical differences between the two groups are observed by both Demons and TOP, and more evident in TOP. For the D2cc, the CP and NCP cases are statistically significance of the difference for all three methods but more pronounced with TOP. Conclusion: In this study, we calculated the rectum D0.1/1/2cc by simple WS addition and two DIR methods and seek for gains in rectum toxicity prediction. The results favor the claim that accurate dose deformation and summation tend to be more sensitive in unveiling the dose-toxicity relationship. This work is supported in part by grant from VARIAN MEDICAL SYSTEMS INC, the National Natural Science Foundation of China (no 81428019 and no 81301940), the Guangdong Natural Science Foundation (2015A030313302)and the 2015 Pearl River S&T Nova Program of Guangzhou (201506010096).« less
Safe semi-supervised learning based on weighted likelihood.
Kawakita, Masanori; Takeuchi, Jun'ichi
2014-05-01
We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')
Mackay, D; Kengatharan, M
1994-01-01
1. A new method has been used to measure pKI values of prazosin and idazoxan against neuronally-released transmitter in the epididymal portion of the rat isolated vas deferens. The most reproducible results were obtained with a prolonged antagonist equilibration time (1 h). 2. Under these conditions the pKI of prazosin was practically unaffected by addition of alpha, beta-methylene-adenosine-5'-triphosphate (10 microM) to desensitize purinoceptors. Addition of desmethylimipramine (DMI) (0.3 microM) produced a small, but statistically non-significant, reduction. 3. The same method has been used to measure the pKI of prazosin against exogenous noradrenaline. In the latter case addition of DMI (0.3 microM) and corticosterone (30 microM) together produced a statistically significant reduction in the apparent pKI of prazosin. 4. The new method for estimating pKI values shows that DMI itself acts either pseudo-irreversibly or non-competitively and may be reducing the apparent pKI of prazosin. 5. The pKI values obtained for prazosin and idazoxan against neuronally-released transmitter are in good agreement with those obtained by other workers for the actions of these drugs on alpha-adrenoceptors.
Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J
2017-12-01
qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
Lotfy, Hayam M; Mohamed, Dalia; Mowaka, Shereen
2015-01-01
Simple, specific, accurate and precise spectrophotometric methods were developed and validated for the simultaneous determination of the oral antidiabetic drugs; sitagliptin phosphate (STG) and metformin hydrochloride (MET) in combined pharmaceutical formulations. Three methods were manipulating ratio spectra namely; ratio difference (RD), ratio subtraction (RS) and a novel approach of induced amplitude modulation (IAM) methods. The first two methods were used for determination of STG, while MET was directly determined by measuring its absorbance at λmax 232 nm. However, (IAM) was used for the simultaneous determination of both drugs. Moreover, another three methods were developed based on derivative spectroscopy followed by mathematical manipulation steps namely; amplitude factor (P-factor), amplitude subtraction (AS) and modified amplitude subtraction (MAS). In addition, in this work the novel sample enrichment technique named spectrum addition was adopted. The proposed spectrophotometric methods did not require any preliminary separation step. The accuracy, precision and linearity ranges of the proposed methods were determined. The selectivity of the developed methods was investigated by analyzing laboratory prepared mixtures of the drugs and their combined pharmaceutical formulations. Standard deviation values were less than 1.5 in the assay of raw materials and tablets. The obtained results were statistically compared to that of a reported spectrophotometric method. The statistical comparison showed that there was no significant difference between the proposed methods and the reported one regarding both accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.
Morphological study of the palatal rugae in western Indian population.
Gondivkar, Shailesh M; Patel, Swetal; Gadbail, Amol R; Gaikwad, Rahul N; Chole, Revant; Parikh, Rima V
2011-10-01
The aim of this study was to identify and compare the different morphological rugae patterns in males and females of western Indian population, which may be an additional method of identification in cases of crimes or aircraft accidents. A total of 108 plaster casts, equally distributed between the sexes and belonging to similar age-group, were examined for different biometric characteristics of the palatal rugae including number, shape, length, direction and unification and their incidence recorded. Association between these rugae biometric characteristics and sex were tested using chi-square analysis and statistical descriptors were identified for each of these parameters using the SPSS 15.0. The study revealed a statistically significant difference in the total number of rugae between the two sexes (P = 0.000). The different types of rugae between the males and females were statistically compared. The female showed a highly significant difference in the sinuous (P = 0.002) and primary type (P = 0.000) while the male had a significant difference in the unification (P = 0.005). The predominant direction of the rugae was found to be forward relative to backward. It may be concluded that the rugae pattern can be an additional method of differentiation between the male and female in conjunction with the other methods such as visual, fingerprints, and dental characteristics in forensic sciences. Copyright © 2011 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Technow, Frank; Messina, Carlos D; Totir, L Radu; Cooper, Mark
2015-01-01
Genomic selection, enabled by whole genome prediction (WGP) methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E), continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs) attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC), a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics.
Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation
Technow, Frank; Messina, Carlos D.; Totir, L. Radu; Cooper, Mark
2015-01-01
Genomic selection, enabled by whole genome prediction (WGP) methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E), continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs) attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC), a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics. PMID:26121133
Marzulli, F; Maguire, H C
1982-02-01
Several guinea-pig predictive test methods were evaluated by comparison of results with those obtained with human predictive tests, using ten compounds that have been used in cosmetics. The method involves the statistical analysis of the frequency with which guinea-pig tests agree with the findings of tests in humans. In addition, the frequencies of false positive and false negative predictive findings are considered and statistically analysed. The results clearly demonstrate the superiority of adjuvant tests (complete Freund's adjuvant) in determining skin sensitizers and the overall superiority of the guinea-pig maximization test in providing results similar to those obtained by human testing. A procedure is suggested for utilizing adjuvant and non-adjuvant test methods for characterizing compounds as of weak, moderate or strong sensitizing potential.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-01
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Exercise reduces depressive symptoms in adults with arthritis: Evidential value.
Kelley, George A; Kelley, Kristi S
2016-07-12
To determine whether evidential value exists that exercise reduces depression in adults with arthritis and other rheumatic conditions. Utilizing data derived from a prior meta-analysis of 29 randomized controlled trials comprising 2449 participants (1470 exercise, 979 control) with fibromyalgia, osteoarthritis, rheumatoid arthritis or systemic lupus erythematosus, a new method, P -curve, was utilized to assess for evidentiary worth as well as dismiss the possibility of discriminating reporting of statistically significant results regarding exercise and depression in adults with arthritis and other rheumatic conditions. Using the method of Stouffer, Z -scores were calculated to examine selective-reporting bias. An alpha ( P ) value < 0.05 was deemed statistically significant. In addition, average power of the tests included in P -curve, adjusted for publication bias, was calculated. Fifteen of 29 studies (51.7%) with exercise and depression results were statistically significant ( P < 0.05) while none of the results were statistically significant with respect to exercise increasing depression in adults with arthritis and other rheumatic conditions. Right-skew to dismiss selective reporting was identified ( Z = -5.28, P < 0.0001). In addition, the included studies did not lack evidential value ( Z = 2.39, P = 0.99), nor did they lack evidential value and were P -hacked ( Z = 5.28, P > 0.99). The relative frequencies of P -values were 66.7% at 0.01, 6.7% each at 0.02 and 0.03, 13.3% at 0.04 and 6.7% at 0.05. The average power of the tests included in P -curve, corrected for publication bias, was 69%. Diagnostic plot results revealed that the observed power estimate was a better fit than the alternatives. Evidential value results provide additional support that exercise reduces depression in adults with arthritis and other rheumatic conditions.
Fractal analysis of the short time series in a visibility graph method
NASA Astrophysics Data System (ADS)
Li, Ruixue; Wang, Jiang; Yu, Haitao; Deng, Bin; Wei, Xile; Chen, Yingyuan
2016-05-01
The aim of this study is to evaluate the performance of the visibility graph (VG) method on short fractal time series. In this paper, the time series of Fractional Brownian motions (fBm), characterized by different Hurst exponent H, are simulated and then mapped into a scale-free visibility graph, of which the degree distributions show the power-law form. The maximum likelihood estimation (MLE) is applied to estimate power-law indexes of degree distribution, and in this progress, the Kolmogorov-Smirnov (KS) statistic is used to test the performance of estimation of power-law index, aiming to avoid the influence of droop head and heavy tail in degree distribution. As a result, we find that the MLE gives an optimal estimation of power-law index when KS statistic reaches its first local minimum. Based on the results from KS statistic, the relationship between the power-law index and the Hurst exponent is reexamined and then amended to meet short time series. Thus, a method combining VG, MLE and KS statistics is proposed to estimate Hurst exponents from short time series. Lastly, this paper also offers an exemplification to verify the effectiveness of the combined method. In addition, the corresponding results show that the VG can provide a reliable estimation of Hurst exponents.
Preparing and Presenting Effective Research Posters
Miller, Jane E
2007-01-01
Objectives Posters are a common way to present results of a statistical analysis, program evaluation, or other project at professional conferences. Often, researchers fail to recognize the unique nature of the format, which is a hybrid of a published paper and an oral presentation. This methods note demonstrates how to design research posters to convey study objectives, methods, findings, and implications effectively to varied professional audiences. Methods A review of existing literature on research communication and poster design is used to identify and demonstrate important considerations for poster content and layout. Guidelines on how to write about statistical methods, results, and statistical significance are illustrated with samples of ineffective writing annotated to point out weaknesses, accompanied by concrete examples and explanations of improved presentation. A comparison of the content and format of papers, speeches, and posters is also provided. Findings Each component of a research poster about a quantitative analysis should be adapted to the audience and format, with complex statistical results translated into simplified charts, tables, and bulleted text to convey findings as part of a clear, focused story line. Conclusions Effective research posters should be designed around two or three key findings with accompanying handouts and narrative description to supply additional technical detail and encourage dialog with poster viewers. PMID:17355594
Allawala, Altan; Marston, J B
2016-11-01
We investigate the Fokker-Planck description of the equal-time statistics of the three-dimensional Lorenz attractor with additive white noise. The invariant measure is found by computing the zero (or null) mode of the linear Fokker-Planck operator as a problem of sparse linear algebra. Two variants are studied: a self-adjoint construction of the linear operator and the replacement of diffusion with hyperdiffusion. We also access the low-order statistics of the system by a perturbative expansion in equal-time cumulants. A comparison is made to statistics obtained by the standard approach of accumulation via direct numerical simulation. Theoretical and computational aspects of the Fokker-Planck and cumulant expansion methods are discussed.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Price, Charlotte; Stallard, Nigel; Creton, Stuart; Indans, Ian; Guest, Robert; Griffiths, David; Edwards, Philippa
2010-01-01
Acute inhalation toxicity of chemicals has conventionally been assessed by the median lethal concentration (LC50) test (organisation for economic co-operation and development (OECD) TG 403). Two new methods, the recently adopted acute toxic class method (ATC; OECD TG 436) and a proposed fixed concentration procedure (FCP), have recently been considered, but statistical evaluations of these methods did not investigate the influence of differential sensitivity between male and female rats on the outcomes. This paper presents an analysis of data from the assessment of acute inhalation toxicity for 56 substances. Statistically significant differences between the LC50 for males and females were found for 16 substances, with greater than 10-fold differences in the LC50 for two substances. The paper also reports a statistical evaluation of the three test methods in the presence of unanticipated gender differences. With TG 403, a gender difference leads to a slightly greater chance of under-classification. This is also the case for the ATC method, but more pronounced than for TG 403, with misclassification of nearly all substances from Globally Harmonised System (GHS) class 3 into class 4. As the FCP uses females only, if females are more sensitive, the classification is unchanged. If males are more sensitive, the procedure may lead to under-classification. Additional research on modification of the FCP is thus proposed. PMID:20488841
Estimation of regionalized compositions: A comparison of three methods
Pawlowsky, V.; Olea, R.A.; Davis, J.C.
1995-01-01
A regionalized composition is a random vector function whose components are positive and sum to a constant at every point of the sampling region. Consequently, the components of a regionalized composition are necessarily spatially correlated. This spatial dependence-induced by the constant sum constraint-is a spurious spatial correlation and may lead to misinterpretations of statistical analyses. Furthermore, the cross-covariance matrices of the regionalized composition are singular, as is the coefficient matrix of the cokriging system of equations. Three methods of performing estimation or prediction of a regionalized composition at unsampled points are discussed: (1) the direct approach of estimating each variable separately; (2) the basis method, which is applicable only when a random function is available that can he regarded as the size of the regionalized composition under study; (3) the logratio approach, using the additive-log-ratio transformation proposed by J. Aitchison, which allows statistical analysis of compositional data. We present a brief theoretical review of these three methods and compare them using compositional data from the Lyons West Oil Field in Kansas (USA). It is shown that, although there are no important numerical differences, the direct approach leads to invalid results, whereas the basis method and the additive-log-ratio approach are comparable. ?? 1995 International Association for Mathematical Geology.
Immunofluorescence detection of pea protein in meat products.
Petrášová, Michaela; Pospiech, Matej; Tremlová, Bohuslava; Javůrková, Zdeňka
2016-08-01
In this study we developed an immunofluorescence method to detect pea protein in meat products. Pea protein has a high nutritional value but in sensitive individuals it may be responsible for causing allergic reactions. We produced model meat products with various additions of pea protein and flour; the detection limit (LOD) of the method for pea flour was 0.5% addition, and for pea protein it was 0.001% addition. The repeatabilities and reproducibilities for samples both positive and negative for pea protein were all 100%. In a blind test with model products and commercial samples, there was no statistically significant difference (p > 0.05) between the declared concentrations of pea protein and flour and the immunofluorescence method results. Sensitivity was 1.06 and specificity was 1.00. These results show that the immunofluorescence method is suitable for the detection of pea protein in meat products.
Bourlier, Christophe
2005-07-10
The emissivity of two-dimensional anisotropic rough sea surfaces with non-Gaussian statistics is investigated. The emissivity derivation is of importance for retrieval of the sea-surface temperature or equivalent temperature of a rough sea surface by infrared thermal imaging. The well-known Cox-Munk slope probability-density function, considered non-Gaussian, is used for the emissivity derivation, in which the skewness and the kurtosis (related to the third- and fourth-order statistics, respectively) are included. The shadowing effect, which is significant for grazing angles, is also taken into account. The geometric optics approximation is assumed to be valid, which means that the rough surface is modeled as a collection of facets reflecting locally the light in the specular direction. In addition, multiple reflections are ignored. Numerical results of the emissivity are presented for Gaussian and non-Gaussian statistics, for moderate wind speeds, for near-infrared wavelengths, for emission angles ranging from 0 degrees (nadir) to 90 degrees (horizon), and according to the wind direction. In addition, the emissivity is compared with both measurements and a Monte Carlo ray-tracing method.
Uncertainty propagation for statistical impact prediction of space debris
NASA Astrophysics Data System (ADS)
Hoogendoorn, R.; Mooij, E.; Geul, J.
2018-01-01
Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.
AA9int: SNP Interaction Pattern Search Using Non-Hierarchical Additive Model Set.
Lin, Hui-Yi; Huang, Po-Yu; Chen, Dung-Tsa; Tung, Heng-Yuan; Sellers, Thomas A; Pow-Sang, Julio; Eeles, Rosalind; Easton, Doug; Kote-Jarai, Zsofia; Amin Al Olama, Ali; Benlloch, Sara; Muir, Kenneth; Giles, Graham G; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schleutker, Johanna; Nordestgaard, Børge G; Travis, Ruth C; Hamdy, Freddie; Neal, David E; Pashayan, Nora; Khaw, Kay-Tee; Stanford, Janet L; Blot, William J; Thibodeau, Stephen N; Maier, Christiane; Kibel, Adam S; Cybulski, Cezary; Cannon-Albright, Lisa; Brenner, Hermann; Kaneva, Radka; Batra, Jyotsna; Teixeira, Manuel R; Pandha, Hardev; Lu, Yong-Jie; Park, Jong Y
2018-06-07
The use of single nucleotide polymorphism (SNP) interactions to predict complex diseases is getting more attention during the past decade, but related statistical methods are still immature. We previously proposed the SNP Interaction Pattern Identifier (SIPI) approach to evaluate 45 SNP interaction patterns/patterns. SIPI is statistically powerful but suffers from a large computation burden. For large-scale studies, it is necessary to use a powerful and computation-efficient method. The objective of this study is to develop an evidence-based mini-version of SIPI as the screening tool or solitary use and to evaluate the impact of inheritance mode and model structure on detecting SNP-SNP interactions. We tested two candidate approaches: the 'Five-Full' and 'AA9int' method. The Five-Full approach is composed of the five full interaction models considering three inheritance modes (additive, dominant and recessive). The AA9int approach is composed of nine interaction models by considering non-hierarchical model structure and the additive mode. Our simulation results show that AA9int has similar statistical power compared to SIPI and is superior to the Five-Full approach, and the impact of the non-hierarchical model structure is greater than that of the inheritance mode in detecting SNP-SNP interactions. In summary, it is recommended that AA9int is a powerful tool to be used either alone or as the screening stage of a two-stage approach (AA9int+SIPI) for detecting SNP-SNP interactions in large-scale studies. The 'AA9int' and 'parAA9int' functions (standard and parallel computing version) are added in the SIPI R package, which is freely available at https://linhuiyi.github.io/LinHY_Software/. hlin1@lsuhsc.edu. Supplementary data are available at Bioinformatics online.
CT image segmentation methods for bone used in medical additive manufacturing.
van Eijnatten, Maureen; van Dijk, Roelof; Dobbe, Johannes; Streekstra, Geert; Koivisto, Juha; Wolff, Jan
2018-01-01
The accuracy of additive manufactured medical constructs is limited by errors introduced during image segmentation. The aim of this study was to review the existing literature on different image segmentation methods used in medical additive manufacturing. Thirty-two publications that reported on the accuracy of bone segmentation based on computed tomography images were identified using PubMed, ScienceDirect, Scopus, and Google Scholar. The advantages and disadvantages of the different segmentation methods used in these studies were evaluated and reported accuracies were compared. The spread between the reported accuracies was large (0.04 mm - 1.9 mm). Global thresholding was the most commonly used segmentation method with accuracies under 0.6 mm. The disadvantage of this method is the extensive manual post-processing required. Advanced thresholding methods could improve the accuracy to under 0.38 mm. However, such methods are currently not included in commercial software packages. Statistical shape model methods resulted in accuracies from 0.25 mm to 1.9 mm but are only suitable for anatomical structures with moderate anatomical variations. Thresholding remains the most widely used segmentation method in medical additive manufacturing. To improve the accuracy and reduce the costs of patient-specific additive manufactured constructs, more advanced segmentation methods are required. Copyright © 2017 IPEM. Published by Elsevier Ltd. All rights reserved.
Quality evaluation of no-reference MR images using multidirectional filters and image statistics.
Jang, Jinseong; Bang, Kihun; Jang, Hanbyol; Hwang, Dosik
2018-09-01
This study aimed to develop a fully automatic, no-reference image-quality assessment (IQA) method for MR images. New quality-aware features were obtained by applying multidirectional filters to MR images and examining the feature statistics. A histogram of these features was then fitted to a generalized Gaussian distribution function for which the shape parameters yielded different values depending on the type of distortion in the MR image. Standard feature statistics were established through a training process based on high-quality MR images without distortion. Subsequently, the feature statistics of a test MR image were calculated and compared with the standards. The quality score was calculated as the difference between the shape parameters of the test image and the undistorted standard images. The proposed IQA method showed a >0.99 correlation with the conventional full-reference assessment methods; accordingly, this proposed method yielded the best performance among no-reference IQA methods for images containing six types of synthetic, MR-specific distortions. In addition, for authentically distorted images, the proposed method yielded the highest correlation with subjective assessments by human observers, thus demonstrating its superior performance over other no-reference IQAs. Our proposed IQA was designed to consider MR-specific features and outperformed other no-reference IQAs designed mainly for photographic images. Magn Reson Med 80:914-924, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2018 International Society for Magnetic Resonance in Medicine.
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Heskes, Tom; Eisinga, Rob; Breitling, Rainer
2014-11-21
The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .
Determining significant material properties: A discovery approach
NASA Technical Reports Server (NTRS)
Karplus, Alan K.
1992-01-01
The following is a laboratory experiment designed to further understanding of materials science. The experiment itself can be informative for persons of any age past elementary school, and even for some in elementary school. The preparation of the plastic samples is readily accomplished by persons with resonable dexterity in the cutting of paper designs. The completion of the statistical Design of Experiments, which uses Yates' Method, requires basic math (addition and subtraction). Interpretive work requires plotting of data and making observations. Knowledge of statistical methods would be helpful. The purpose of this experiment is to acquaint students with the seven classes of recyclable plastics, and provide hands-on learning about the response of these plastics to mechanical tensile loading.
Nonparametric entropy estimation using kernel densities.
Lake, Douglas E
2009-01-01
The entropy of experimental data from the biological and medical sciences provides additional information over summary statistics. Calculating entropy involves estimates of probability density functions, which can be effectively accomplished using kernel density methods. Kernel density estimation has been widely studied and a univariate implementation is readily available in MATLAB. The traditional definition of Shannon entropy is part of a larger family of statistics, called Renyi entropy, which are useful in applications that require a measure of the Gaussianity of data. Of particular note is the quadratic entropy which is related to the Friedman-Tukey (FT) index, a widely used measure in the statistical community. One application where quadratic entropy is very useful is the detection of abnormal cardiac rhythms, such as atrial fibrillation (AF). Asymptotic and exact small-sample results for optimal bandwidth and kernel selection to estimate the FT index are presented and lead to improved methods for entropy estimation.
Tipping points in the arctic: eyeballing or statistical significance?
Carstensen, Jacob; Weydmann, Agata
2012-02-01
Arctic ecosystems have experienced and are projected to experience continued large increases in temperature and declines in sea ice cover. It has been hypothesized that small changes in ecosystem drivers can fundamentally alter ecosystem functioning, and that this might be particularly pronounced for Arctic ecosystems. We present a suite of simple statistical analyses to identify changes in the statistical properties of data, emphasizing that changes in the standard error should be considered in addition to changes in mean properties. The methods are exemplified using sea ice extent, and suggest that the loss rate of sea ice accelerated by factor of ~5 in 1996, as reported in other studies, but increases in random fluctuations, as an early warning signal, were observed already in 1990. We recommend to employ the proposed methods more systematically for analyzing tipping points to document effects of climate change in the Arctic.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
Ter Braak, Cajo J F; Peres-Neto, Pedro; Dray, Stéphane
2017-01-01
Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p -values (the p max test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the p max test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an "omitted variable bias" problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based p max test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time.
Angeler, David G; Viedma, Olga; Moreno, José M
2009-11-01
Time lag analysis (TLA) is a distance-based approach used to study temporal dynamics of ecological communities by measuring community dissimilarity over increasing time lags. Despite its increased use in recent years, its performance in comparison with other more direct methods (i.e., canonical ordination) has not been evaluated. This study fills this gap using extensive simulations and real data sets from experimental temporary ponds (true zooplankton communities) and landscape studies (landscape categories as pseudo-communities) that differ in community structure and anthropogenic stress history. Modeling time with a principal coordinate of neighborhood matrices (PCNM) approach, the canonical ordination technique (redundancy analysis; RDA) consistently outperformed the other statistical tests (i.e., TLAs, Mantel test, and RDA based on linear time trends) using all real data. In addition, the RDA-PCNM revealed different patterns of temporal change, and the strength of each individual time pattern, in terms of adjusted variance explained, could be evaluated, It also identified species contributions to these patterns of temporal change. This additional information is not provided by distance-based methods. The simulation study revealed better Type I error properties of the canonical ordination techniques compared with the distance-based approaches when no deterministic component of change was imposed on the communities. The simulation also revealed that strong emphasis on uniform deterministic change and low variability at other temporal scales is needed to result in decreased statistical power of the RDA-PCNM approach relative to the other methods. Based on the statistical performance of and information content provided by RDA-PCNM models, this technique serves ecologists as a powerful tool for modeling temporal change of ecological (pseudo-) communities.
Interpretation of correlations in clinical research.
Hung, Man; Bounsanga, Jerry; Voss, Maren Wright
2017-11-01
Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.
Chung, Dongjun; Kim, Hang J; Zhao, Hongyu
2017-02-01
Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies
Liu, Zhonghua; Lin, Xihong
2017-01-01
Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.
Liu, Zhonghua; Lin, Xihong
2018-03-01
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Hu, Yu-Chi J; Grossberg, Michael D; Mageras, Gikas S
2008-01-01
Planning radiotherapy and surgical procedures usually require onerous manual segmentation of anatomical structures from medical images. In this paper we present a semi-automatic and accurate segmentation method to dramatically reduce the time and effort required of expert users. This is accomplished by giving a user an intuitive graphical interface to indicate samples of target and non-target tissue by loosely drawing a few brush strokes on the image. We use these brush strokes to provide the statistical input for a Conditional Random Field (CRF) based segmentation. Since we extract purely statistical information from the user input, we eliminate the need of assumptions on boundary contrast previously used by many other methods, A new feature of our method is that the statistics on one image can be reused on related images without registration. To demonstrate this, we show that boundary statistics provided on a few 2D slices of volumetric medical data, can be propagated through the entire 3D stack of images without using the geometric correspondence between images. In addition, the image segmentation from the CRF can be formulated as a minimum s-t graph cut problem which has a solution that is both globally optimal and fast. The combination of a fast segmentation and minimal user input that is reusable, make this a powerful technique for the segmentation of medical images.
Analysis of Two Methods to Evaluate Antioxidants
ERIC Educational Resources Information Center
Tomasina, Florencia; Carabio, Claudio; Celano, Laura; Thomson, Leonor
2012-01-01
This exercise is intended to introduce undergraduate biochemistry students to the analysis of antioxidants as a biotechnological tool. In addition, some statistical resources will also be used and discussed. Antioxidants play an important metabolic role, preventing oxidative stress-mediated cell and tissue injury. Knowing the antioxidant content…
Comparisons of topological properties in autism for the brain network construction methods
NASA Astrophysics Data System (ADS)
Lee, Min-Hee; Kim, Dong Youn; Lee, Sang Hyeon; Kim, Jin Uk; Chung, Moo K.
2015-03-01
Structural brain networks can be constructed from the white matter fiber tractography of diffusion tensor imaging (DTI), and the structural characteristics of the brain can be analyzed from its networks. When brain networks are constructed by the parcellation method, their network structures change according to the parcellation scale selection and arbitrary thresholding. To overcome these issues, we modified the Ɛ -neighbor construction method proposed by Chung et al. (2011). The purpose of this study was to construct brain networks for 14 control subjects and 16 subjects with autism using both the parcellation and the Ɛ-neighbor construction method and to compare their topological properties between two methods. As the number of nodes increased, connectedness decreased in the parcellation method. However in the Ɛ-neighbor construction method, connectedness remained at a high level even with the rising number of nodes. In addition, statistical analysis for the parcellation method showed significant difference only in the path length. However, statistical analysis for the Ɛ-neighbor construction method showed significant difference with the path length, the degree and the density.
Baseline Estimation and Outlier Identification for Halocarbons
NASA Astrophysics Data System (ADS)
Wang, D.; Schuck, T.; Engel, A.; Gallman, F.
2017-12-01
The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.
Ökmen, Burcu Metin; Ökmen, Korgün
2017-11-01
Shoulder pain can be difficult to treat due to its complex anatomic structure, and different treatment methods can be used. We aimed to examine the efficacy of photobiomodulation therapy (PBMT) and suprascapular nerve (SSN)-pulsed radiofrequency (RF) therapy. In this prospective, randomized, controlled, single-blind study, 59 patients with chronic shoulder pain due to impingement syndrome received PBMT (group H) or SSN-pulsed RF therapy (group P) in addition to exercise therapy for 14 sessions over 2 weeks. Records were taken using visual analog scale (VAS), Shoulder Pain and Disability Index (SPADI), and Nottingham Health Profile (NHP) scoring systems for pretreatment (PRT), posttreatment (PST), and PST follow-up at months 1, 3, and 6. There was no statistically significant difference in initial VAS score, SPADI, and NHP values between group H and group P (p > 0.05). Compared to the values of PRT, PST, and PST at months 1, 3, and 6, VAS, SPADI, and NHP values were statistically significantly lower in both groups (p < 0.001). There was no statistically significant difference at all measurement times in VAS, SPADI, and NHP between the two groups. We established that PBMT and SSN-pulsed RF therapy are effective methods, in addition to exercise therapy, in patients with chronic shoulder pain. PBMT seems to be advantageous compared to SSN-pulsed RF therapy, as it is a noninvasive method.
Statistical scaling of geometric characteristics in stochastically generated pore microstructures
Hyman, Jeffrey D.; Guadagnini, Alberto; Winter, C. Larrabee
2015-05-21
In this study, we analyze the statistical scaling of structural attributes of virtual porous microstructures that are stochastically generated by thresholding Gaussian random fields. Characterization of the extent at which randomly generated pore spaces can be considered as representative of a particular rock sample depends on the metrics employed to compare the virtual sample against its physical counterpart. Typically, comparisons against features and/patterns of geometric observables, e.g., porosity and specific surface area, flow-related macroscopic parameters, e.g., permeability, or autocorrelation functions are used to assess the representativeness of a virtual sample, and thereby the quality of the generation method. Here, wemore » rely on manifestations of statistical scaling of geometric observables which were recently observed in real millimeter scale rock samples [13] as additional relevant metrics by which to characterize a virtual sample. We explore the statistical scaling of two geometric observables, namely porosity (Φ) and specific surface area (SSA), of porous microstructures generated using the method of Smolarkiewicz and Winter [42] and Hyman and Winter [22]. Our results suggest that the method can produce virtual pore space samples displaying the symptoms of statistical scaling observed in real rock samples. Order q sample structure functions (statistical moments of absolute increments) of Φ and SSA scale as a power of the separation distance (lag) over a range of lags, and extended self-similarity (linear relationship between log structure functions of successive orders) appears to be an intrinsic property of the generated media. The width of the range of lags where power-law scaling is observed and the Hurst coefficient associated with the variables we consider can be controlled by the generation parameters of the method.« less
The discrimination of sea ice types using SAR backscatter statistics
NASA Technical Reports Server (NTRS)
Shuchman, Robert A.; Wackerman, Christopher C.; Maffett, Andrew L.; Onstott, Robert G.; Sutherland, Laura L.
1989-01-01
X-band (HH) synthetic aperture radar (SAR) data of sea ice collected during the Marginal Ice Zone Experiment in March and April of 1987 was statistically analyzed with respect to discriminating open water, first-year ice, multiyear ice, and Odden. Odden are large expanses of nilas ice that rapidly form in the Greenland Sea and transform into pancake ice. A first-order statistical analysis indicated that mean versus variance can segment out open water and first-year ice, and skewness versus modified skewness can segment the Odden and multilayer categories. In additions to first-order statistics, a model has been generated for the distribution function of the SAR ice data. Segmentation of ice types was also attempted using textural measurements. In this case, the general co-occurency matrix was evaluated. The textural method did not generate better results than the first-order statistical approach.
Intercomparison of attenuation correction algorithms for single-polarized X-band radars
NASA Astrophysics Data System (ADS)
Lengfeld, K.; Berenguer, M.; Sempere Torres, D.
2018-03-01
Attenuation due to liquid water is one of the largest uncertainties in radar observations. The effects of attenuation are generally inversely proportional to the wavelength, i.e. observations from X-band radars are more affected by attenuation than those from C- or S-band systems. On the other hand, X-band radars can measure precipitation fields in higher temporal and spatial resolution and are more mobile and easier to install due to smaller antennas. A first algorithm for attenuation correction in single-polarized systems was proposed by Hitschfeld and Bordan (1954) (HB), but it gets unstable in case of small errors (e.g. in the radar calibration) and strong attenuation. Therefore, methods have been developed that restrict attenuation correction to keep the algorithm stable, using e.g. surface echoes (for space-borne radars) and mountain returns (for ground radars) as a final value (FV), or adjustment of the radar constant (C) or the coefficient α. In the absence of mountain returns, measurements from C- or S-band radars can be used to constrain the correction. All these methods are based on the statistical relation between reflectivity and specific attenuation. Another way to correct for attenuation in X-band radar observations is to use additional information from less attenuated radar systems, e.g. the ratio between X-band and C- or S-band radar measurements. Lengfeld et al. (2016) proposed such a method based isotonic regression of the ratio between X- and C-band radar observations along the radar beam. This study presents a comparison of the original HB algorithm and three algorithms based on the statistical relation between reflectivity and specific attenuation as well as two methods implementing additional information of C-band radar measurements. Their performance in two precipitation events (one mainly convective and the other one stratiform) shows that a restriction of the HB is necessary to avoid instabilities. A comparison with vertically pointing micro rain radars (MRR) reveals good performance of two of the methods based in the statistical k-Z-relation: FV and α. The C algorithm seems to be more sensitive to differences in calibration of the two systems and requires additional information from C- or S-band radars. Furthermore, a study of five months of radar observations examines the long-term performance of each algorithm. From this study conclusions can be drawn that using additional information from less attenuated radar systems lead to best results. The two algorithms that use this additional information eliminate the bias caused by attenuation and preserve the agreement with MRR observations.
A study on the application of topic models to motif finding algorithms.
Basha Gutierrez, Josep; Nakai, Kenta
2016-12-22
Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.
Agier, Lydiane; Portengen, Lützen; Chadeau-Hyam, Marc; Basagaña, Xavier; Giorgis-Allemand, Lise; Siroux, Valérie; Robinson, Oliver; Vlaanderen, Jelle; González, Juan R; Nieuwenhuijsen, Mark J; Vineis, Paolo; Vrijheid, Martine; Slama, Rémy; Vermeulen, Roel
2016-12-01
The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.
From fields to objects: A review of geographic boundary analysis
NASA Astrophysics Data System (ADS)
Jacquez, G. M.; Maruca, S.; Fortin, M.-J.
Geographic boundary analysis is a relatively new approach unfamiliar to many spatial analysts. It is best viewed as a technique for defining objects - geographic boundaries - on spatial fields, and for evaluating the statistical significance of characteristics of those boundary objects. This is accomplished using null spatial models representative of the spatial processes expected in the absence of boundary-generating phenomena. Close ties to the object-field dialectic eminently suit boundary analysis to GIS data. The majority of existing spatial methods are field-based in that they describe, estimate, or predict how attributes (variables defining the field) vary through geographic space. Such methods are appropriate for field representations but not object representations. As the object-field paradigm gains currency in geographic information science, appropriate techniques for the statistical analysis of objects are required. The methods reviewed in this paper are a promising foundation. Geographic boundary analysis is clearly a valuable addition to the spatial statistical toolbox. This paper presents the philosophy of, and motivations for geographic boundary analysis. It defines commonly used statistics for quantifying boundaries and their characteristics, as well as simulation procedures for evaluating their significance. We review applications of these techniques, with the objective of making this promising approach accessible to the GIS-spatial analysis community. We also describe the implementation of these methods within geographic boundary analysis software: GEM.
Patton, Charles J.; Kryskalla, Jennifer R.
2011-01-01
In addition to operational details and performance benchmarks for these new DA-AtNaR2 nitrate + nitrite assays, this report also provides results of interference studies for common inorganic and organic matrix constituents at 1, 10, and 100 times their median concentrations in surface-water and groundwater samples submitted annually to the NWQL for nitrate + nitrite analyses. Paired t-test and Wilcoxon signed-rank statistical analyses of results determined by CFA-CdR methods and DA-AtNaR2 methods indicate that nitrate concentration differences between population means or sign ranks were either statistically equivalent to zero at the 95 percent confidence level (p ≥ 0.05) or analytically equivalent to zero-that is, when p < 0.05, concentration differences between population means or medians were less than MDLs.
Statistical Selection of Biological Models for Genome-Wide Association Analyses.
Bi, Wenjian; Kang, Guolian; Pounds, Stanley B
2018-05-24
Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.
Relationships between Self-Concept and Resilience Profiles in Young People with Disabilities
ERIC Educational Resources Information Center
Suriá Martínez, Raquel
2016-01-01
Introduction: The present study aims to identify different profiles in self-concept and resilience. In addition, statistically significant differences in self-concept domains among the profiles previously identified are analyzed. Method: The AF5 Self-Concept Questionnaire ("Cuestionario de Autoconcepto AF5") and the Resilience Scale were…
Michigan's forests, 2004: statistics and quality assurance
Scott A. Pugh; Mark H. Hansen; Gary Brand; Ronald E. McRoberts
2010-01-01
The first annual inventory of Michigan's forests was completed in 2004 after 18,916 plots were selected and 10,355 forested plots were visited. This report includes detailed information on forest inventory methods, quality of estimates, and additional tables. An earlier publication presented analyses of the inventoried data (Pugh et al. 2009).
Assessing the influence of abatement efforts and other human activities on ozone levels is complicated by the atmosphere's changeable nature. Two statistical methods, the dynamic linear model(DLM) and the generalized additive model (GAM), are used to estimate ozone trends in the...
A Monte Carlo study of Weibull reliability analysis for space shuttle main engine components
NASA Technical Reports Server (NTRS)
Abernethy, K.
1986-01-01
The incorporation of a number of additional capabilities into an existing Weibull analysis computer program and the results of Monte Carlo computer simulation study to evaluate the usefulness of the Weibull methods using samples with a very small number of failures and extensive censoring are discussed. Since the censoring mechanism inherent in the Space Shuttle Main Engine (SSME) data is hard to analyze, it was decided to use a random censoring model, generating censoring times from a uniform probability distribution. Some of the statistical techniques and computer programs that are used in the SSME Weibull analysis are described. The methods documented in were supplemented by adding computer calculations of approximate (using iteractive methods) confidence intervals for several parameters of interest. These calculations are based on a likelihood ratio statistic which is asymptotically a chisquared statistic with one degree of freedom. The assumptions built into the computer simulations are described. The simulation program and the techniques used in it are described there also. Simulation results are tabulated for various combinations of Weibull shape parameters and the numbers of failures in the samples.
A statistical method to estimate low-energy hadronic cross sections
NASA Astrophysics Data System (ADS)
Balassa, Gábor; Kovács, Péter; Wolf, György
2018-02-01
In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.
Statistical assessment of the learning curves of health technologies.
Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T
2001-01-01
(1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)
Similar Estimates of Temperature Impacts on Global Wheat Yield by Three Independent Methods
NASA Technical Reports Server (NTRS)
Liu, Bing; Asseng, Senthold; Muller, Christoph; Ewart, Frank; Elliott, Joshua; Lobell, David B.; Martre, Pierre; Ruane, Alex C.; Wallach, Daniel; Jones, James W.;
2016-01-01
The potential impact of global temperature change on global crop yield has recently been assessed with different methods. Here we show that grid-based and point-based simulations and statistical regressions (from historic records), without deliberate adaptation or CO2 fertilization effects, produce similar estimates of temperature impact on wheat yields at global and national scales. With a 1 C global temperature increase, global wheat yield is projected to decline between 4.1% and 6.4%. Projected relative temperature impacts from different methods were similar for major wheat-producing countries China, India, USA and France, but less so for Russia. Point-based and grid-based simulations, and to some extent the statistical regressions, were consistent in projecting that warmer regions are likely to suffer more yield loss with increasing temperature than cooler regions. By forming a multi-method ensemble, it was possible to quantify 'method uncertainty' in addition to model uncertainty. This significantly improves confidence in estimates of climate impacts on global food security.
Similar estimates of temperature impacts on global wheat yield by three independent methods
NASA Astrophysics Data System (ADS)
Liu, Bing; Asseng, Senthold; Müller, Christoph; Ewert, Frank; Elliott, Joshua; Lobell, David B.; Martre, Pierre; Ruane, Alex C.; Wallach, Daniel; Jones, James W.; Rosenzweig, Cynthia; Aggarwal, Pramod K.; Alderman, Phillip D.; Anothai, Jakarat; Basso, Bruno; Biernath, Christian; Cammarano, Davide; Challinor, Andy; Deryng, Delphine; Sanctis, Giacomo De; Doltra, Jordi; Fereres, Elias; Folberth, Christian; Garcia-Vila, Margarita; Gayler, Sebastian; Hoogenboom, Gerrit; Hunt, Leslie A.; Izaurralde, Roberto C.; Jabloun, Mohamed; Jones, Curtis D.; Kersebaum, Kurt C.; Kimball, Bruce A.; Koehler, Ann-Kristin; Kumar, Soora Naresh; Nendel, Claas; O'Leary, Garry J.; Olesen, Jørgen E.; Ottman, Michael J.; Palosuo, Taru; Prasad, P. V. Vara; Priesack, Eckart; Pugh, Thomas A. M.; Reynolds, Matthew; Rezaei, Ehsan E.; Rötter, Reimund P.; Schmid, Erwin; Semenov, Mikhail A.; Shcherbak, Iurii; Stehfest, Elke; Stöckle, Claudio O.; Stratonovitch, Pierre; Streck, Thilo; Supit, Iwan; Tao, Fulu; Thorburn, Peter; Waha, Katharina; Wall, Gerard W.; Wang, Enli; White, Jeffrey W.; Wolf, Joost; Zhao, Zhigan; Zhu, Yan
2016-12-01
The potential impact of global temperature change on global crop yield has recently been assessed with different methods. Here we show that grid-based and point-based simulations and statistical regressions (from historic records), without deliberate adaptation or CO2 fertilization effects, produce similar estimates of temperature impact on wheat yields at global and national scales. With a 1 °C global temperature increase, global wheat yield is projected to decline between 4.1% and 6.4%. Projected relative temperature impacts from different methods were similar for major wheat-producing countries China, India, USA and France, but less so for Russia. Point-based and grid-based simulations, and to some extent the statistical regressions, were consistent in projecting that warmer regions are likely to suffer more yield loss with increasing temperature than cooler regions. By forming a multi-method ensemble, it was possible to quantify `method uncertainty’ in addition to model uncertainty. This significantly improves confidence in estimates of climate impacts on global food security.
Statistical Analysis of the Uncertainty in Pre-Flight Aerodynamic Database of a Hypersonic Vehicle
NASA Astrophysics Data System (ADS)
Huh, Lynn
The objective of the present research was to develop a new method to derive the aerodynamic coefficients and the associated uncertainties for flight vehicles via post- flight inertial navigation analysis using data from the inertial measurement unit. Statistical estimates of vehicle state and aerodynamic coefficients are derived using Monte Carlo simulation. Trajectory reconstruction using the inertial navigation system (INS) is a simple and well used method. However, deriving realistic uncertainties in the reconstructed state and any associated parameters is not so straight forward. Extended Kalman filters, batch minimum variance estimation and other approaches have been used. However, these methods generally depend on assumed physical models, assumed statistical distributions (usually Gaussian) or have convergence issues for non-linear problems. The approach here assumes no physical models, is applicable to any statistical distribution, and does not have any convergence issues. The new approach obtains the statistics directly from a sufficient number of Monte Carlo samples using only the generally well known gyro and accelerometer specifications and could be applied to the systems of non-linear form and non-Gaussian distribution. When redundant data are available, the set of Monte Carlo simulations are constrained to satisfy the redundant data within the uncertainties specified for the additional data. The proposed method was applied to validate the uncertainty in the pre-flight aerodynamic database of the X-43A Hyper-X research vehicle. In addition to gyro and acceleration data, the actual flight data include redundant measurements of position and velocity from the global positioning system (GPS). The criteria derived from the blend of the GPS and INS accuracy was used to select valid trajectories for statistical analysis. The aerodynamic coefficients were derived from the selected trajectories by either direct extraction method based on the equations in dynamics, or by the inquiry of the pre-flight aerodynamic database. After the application of the proposed method to the case of the X-43A Hyper-X research vehicle, it was found that 1) there were consistent differences in the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis, 2) the pre-flight estimation of the pitching moment coefficients was significantly different from the post-flight analysis, 3) the type of distribution of the states from the Monte Carlo simulation were affected by that of the perturbation parameters, 4) the uncertainties in the pre-flight model were overestimated, 5) the range where the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis are in closest agreement is between Mach *.* and *.* and more data points may be needed between Mach * and ** in the pre-flight aerodynamic database, 6) selection criterion for valid trajectories from the Monte Carlo simulations was mostly driven by the horizontal velocity error, 7) the selection criterion must be based on reasonable model to ensure the validity of the statistics from the proposed method, and 8) the results from the proposed method applied to the two different flights with the identical geometry and similar flight profile were consistent.
Fritscher, Karl; Schuler, Benedikt; Link, Thomas; Eckstein, Felix; Suhm, Norbert; Hänni, Markus; Hengg, Clemens; Schubert, Rainer
2008-01-01
Fractures of the proximal femur are one of the principal causes of mortality among elderly persons. Traditional methods for the determination of femoral fracture risk use methods for measuring bone mineral density. However, BMD alone is not sufficient to predict bone failure load for an individual patient and additional parameters have to be determined for this purpose. In this work an approach that uses statistical models of appearance to identify relevant regions and parameters for the prediction of biomechanical properties of the proximal femur will be presented. By using Support Vector Regression the proposed model based approach is capable of predicting two different biomechanical parameters accurately and fully automatically in two different testing scenarios.
Predictive onboard flow control for packet switching satellites
NASA Technical Reports Server (NTRS)
Bobinsky, Eric A.
1992-01-01
We outline two alternate approaches to predicting the onset of congestion in a packet switching satellite, and argue that predictive, rather than reactive, flow control is necessary for the efficient operation of such a system. The first method discussed is based on standard, statistical techniques which are used to periodically calculate a probability of near-term congestion based on arrival rate statistics. If this probability exceeds a present threshold, the satellite would transmit a rate-reduction signal to all active ground stations. The second method discussed would utilize a neural network to periodically predict the occurrence of buffer overflow based on input data which would include, in addition to arrival rates, the distributions of packet lengths, source addresses, and destination addresses.
Hickey, Graeme L; Blackstone, Eugene H
2016-08-01
Clinical risk-prediction models serve an important role in healthcare. They are used for clinical decision-making and measuring the performance of healthcare providers. To establish confidence in a model, external model validation is imperative. When designing such an external model validation study, thought must be given to patient selection, risk factor and outcome definitions, missing data, and the transparent reporting of the analysis. In addition, there are a number of statistical methods available for external model validation. Execution of a rigorous external validation study rests in proper study design, application of suitable statistical methods, and transparent reporting. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
Analysis of conditional genetic effects and variance components in developmental genetics.
Zhu, J
1995-12-01
A genetic model with additive-dominance effects and genotype x environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t-1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects.
Analysis of Conditional Genetic Effects and Variance Components in Developmental Genetics
Zhu, J.
1995-01-01
A genetic model with additive-dominance effects and genotype X environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t - 1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects. PMID:8601500
Timp, Sheila; Karssemeijer, Nico
2004-05-01
Mass segmentation plays a crucial role in computer-aided diagnosis (CAD) systems for classification of suspicious regions as normal, benign, or malignant. In this article we present a robust and automated segmentation technique--based on dynamic programming--to segment mass lesions from surrounding tissue. In addition, we propose an efficient algorithm to guarantee resulting contours to be closed. The segmentation method based on dynamic programming was quantitatively compared with two other automated segmentation methods (region growing and the discrete contour model) on a dataset of 1210 masses. For each mass an overlap criterion was calculated to determine the similarity with manual segmentation. The mean overlap percentage for dynamic programming was 0.69, for the other two methods 0.60 and 0.59, respectively. The difference in overlap percentage was statistically significant. To study the influence of the segmentation method on the performance of a CAD system two additional experiments were carried out. The first experiment studied the detection performance of the CAD system for the different segmentation methods. Free-response receiver operating characteristics analysis showed that the detection performance was nearly identical for the three segmentation methods. In the second experiment the ability of the classifier to discriminate between malignant and benign lesions was studied. For region based evaluation the area Az under the receiver operating characteristics curve was 0.74 for dynamic programming, 0.72 for the discrete contour model, and 0.67 for region growing. The difference in Az values obtained by the dynamic programming method and region growing was statistically significant. The differences between other methods were not significant.
2014-01-01
Background Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. Methods 126 hypothetical trial scenarios were evaluated (126 000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Results Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Conclusions Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power. PMID:24712304
NASA Astrophysics Data System (ADS)
Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng
2018-02-01
Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.
Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng
2018-02-01
Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.
2014-01-01
Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607
Spectroflurimetric estimation of the new antiviral agent ledipasvir in presence of sofosbuvir
NASA Astrophysics Data System (ADS)
Salama, Fathy M.; Attia, Khalid A.; Abouserie, Ahmed A.; El-Olemy, Ahmed; Abolmagd, Ebrahim
2018-02-01
A spectroflurimetric method has been developed and validated for the selective quantitative determination of ledipasvir in presence of sofosbuvir. In this method the native fluorescence of ledipasvir in ethanol at 405 nm was measured after excitation at 340 nm. The proposed method was validated according to ICH guidelines and show high sensitivity, accuracy and precision. Furthermore this method was successfully applied to the analysis of ledipasvir in pharmaceutical dosage form without interference from sofosbuvir and other additives and the results were statistically compared to a reported method and found no significant difference.
Proper Image Subtraction—Optimal Transient Detection, Photometry, and Hypothesis Testing
NASA Astrophysics Data System (ADS)
Zackay, Barak; Ofek, Eran O.; Gal-Yam, Avishay
2016-10-01
Transient detection and flux measurement via image subtraction stand at the base of time domain astronomy. Due to the varying seeing conditions, the image subtraction process is non-trivial, and existing solutions suffer from a variety of problems. Starting from basic statistical principles, we develop the optimal statistic for transient detection, flux measurement, and any image-difference hypothesis testing. We derive a closed-form statistic that: (1) is mathematically proven to be the optimal transient detection statistic in the limit of background-dominated noise, (2) is numerically stable, (3) for accurately registered, adequately sampled images, does not leave subtraction or deconvolution artifacts, (4) allows automatic transient detection to the theoretical sensitivity limit by providing credible detection significance, (5) has uncorrelated white noise, (6) is a sufficient statistic for any further statistical test on the difference image, and, in particular, allows us to distinguish particle hits and other image artifacts from real transients, (7) is symmetric to the exchange of the new and reference images, (8) is at least an order of magnitude faster to compute than some popular methods, and (9) is straightforward to implement. Furthermore, we present extensions of this method that make it resilient to registration errors, color-refraction errors, and any noise source that can be modeled. In addition, we show that the optimal way to prepare a reference image is the proper image coaddition presented in Zackay & Ofek. We demonstrate this method on simulated data and real observations from the PTF data release 2. We provide an implementation of this algorithm in MATLAB and Python.
Lotfy, Hayam Mahmoud; Hegazy, Maha A; Rezk, Mamdouh R; Omran, Yasmin Rostom
2014-05-21
Two smart and novel spectrophotometric methods namely; absorbance subtraction (AS) and amplitude modulation (AM) were developed and validated for the determination of a binary mixture of timolol maleate (TIM) and dorzolamide hydrochloride (DOR) in presence of benzalkonium chloride without prior separation, using unified regression equation. Additionally, simple, specific, accurate and precise spectrophotometric methods manipulating ratio spectra were developed and validated for simultaneous determination of the binary mixture namely; simultaneous ratio subtraction (SRS), ratio difference (RD), ratio subtraction (RS) coupled with extended ratio subtraction (EXRS), constant multiplication method (CM) and mean centering of ratio spectra (MCR). The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of a reported spectrophotometric method. The statistical comparison showed that there is no significant difference between the proposed methods and the reported one regarding both accuracy and precision. Copyright © 2014 Elsevier B.V. All rights reserved.
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Statistical Correction of Air Temperature Forecasts for City and Road Weather Applications
NASA Astrophysics Data System (ADS)
Mahura, Alexander; Petersen, Claus; Sass, Bent; Gilet, Nicolas
2014-05-01
The method for statistical correction of air /road surface temperatures forecasts was developed based on analysis of long-term time-series of meteorological observations and forecasts (from HIgh Resolution Limited Area Model & Road Conditions Model; 3 km horizontal resolution). It has been tested for May-Aug 2012 & Oct 2012 - Mar 2013, respectively. The developed method is based mostly on forecasted meteorological parameters with a minimal inclusion of observations (covering only a pre-history period). Although the st iteration correction is based taking into account relevant temperature observations, but the further adjustment of air and road temperature forecasts is based purely on forecasted meteorological parameters. The method is model independent, e.g. it can be applied for temperature correction with other types of models having different horizontal resolutions. It is relatively fast due to application of the singular value decomposition method for matrix solution to find coefficients. Moreover, there is always a possibility for additional improvement due to extra tuning of the temperature forecasts for some locations (stations), and in particular, where for example, the MAEs are generally higher compared with others (see Gilet et al., 2014). For the city weather applications, new operationalized procedure for statistical correction of the air temperature forecasts has been elaborated and implemented for the HIRLAM-SKA model runs at 00, 06, 12, and 18 UTCs covering forecast lengths up to 48 hours. The procedure includes segments for extraction of observations and forecast data, assigning these to forecast lengths, statistical correction of temperature, one-&multi-days statistical evaluation of model performance, decision-making on using corrections by stations, interpolation, visualisation and storage/backup. Pre-operational air temperature correction runs were performed for the mainland Denmark since mid-April 2013 and shown good results. Tests also showed that the CPU time required for the operational procedure is relatively short (less than 15 minutes including a large time spent for interpolation). These also showed that in order to start correction of forecasts there is no need to have a long-term pre-historical data (containing forecasts and observations) and, at least, a couple of weeks will be sufficient when a new observational station is included and added to the forecast point. Note for the road weather application, the operationalization of the statistical correction of the road surface temperature forecasts (for the RWM system daily hourly runs covering forecast length up to 5 hours ahead) for the Danish road network (for about 400 road stations) was also implemented, and it is running in a test mode since Sep 2013. The method can also be applied for correction of the dew point temperature and wind speed (as a part of observations/ forecasts at synoptical stations), where these both meteorological parameters are parts of the proposed system of equations. The evaluation of the method performance for improvement of the wind speed forecasts is planned as well, with considering possibilities for the wind direction improvements (which is more complex due to multi-modal types of such data distribution). The method worked for the entire domain of mainland Denmark (tested for 60 synoptical and 395 road stations), and hence, it can be also applied for any geographical point within this domain, as through interpolation to about 100 cities' locations (for Danish national byvejr forecasts). Moreover, we can assume that the same method can be used in other geographical areas. The evaluation for other domains (with a focus on Greenland and Nordic countries) is planned. In addition, a similar approach might be also tested for statistical correction of concentrations of chemical species, but such approach will require additional elaboration and evaluation.
Lefebvre, Alexandre; Rochefort, Gael Y.; Santos, Frédéric; Le Denmat, Dominique; Salmon, Benjamin; Pétillon, Jean-Marc
2016-01-01
Over the last decade, biomedical 3D-imaging tools have gained widespread use in the analysis of prehistoric bone artefacts. While initial attempts to characterise the major categories used in osseous industry (i.e. bone, antler, and dentine/ivory) have been successful, the taxonomic determination of prehistoric artefacts remains to be investigated. The distinction between reindeer and red deer antler can be challenging, particularly in cases of anthropic and/or taphonomic modifications. In addition to the range of destructive physicochemical identification methods available (mass spectrometry, isotopic ratio, and DNA analysis), X-ray micro-tomography (micro-CT) provides convincing non-destructive 3D images and analyses. This paper presents the experimental protocol (sample scans, image processing, and statistical analysis) we have developed in order to identify modern and archaeological antler collections (from Isturitz, France). This original method is based on bone microstructure analysis combined with advanced statistical support vector machine (SVM) classifiers. A combination of six microarchitecture biomarkers (bone volume fraction, trabecular number, trabecular separation, trabecular thickness, trabecular bone pattern factor, and structure model index) were screened using micro-CT in order to characterise internal alveolar structure. Overall, reindeer alveoli presented a tighter mesh than red deer alveoli, and statistical analysis allowed us to distinguish archaeological antler by species with an accuracy of 96%, regardless of anatomical location on the antler. In conclusion, micro-CT combined with SVM classifiers proves to be a promising additional non-destructive method for antler identification, suitable for archaeological artefacts whose degree of human modification and cultural heritage or scientific value has previously made it impossible (tools, ornaments, etc.). PMID:26901355
Precipitate statistics in an Al-Mg-Si-Cu alloy from scanning precession electron diffraction data
NASA Astrophysics Data System (ADS)
Sunde, J. K.; Paulsen, Ø.; Wenner, S.; Holmestad, R.
2017-09-01
The key microstructural feature providing strength to age-hardenable Al alloys is nanoscale precipitates. Alloy development requires a reliable statistical assessment of these precipitates, in order to link the microstructure with material properties. Here, it is demonstrated that scanning precession electron diffraction combined with computational analysis enable the semi-automated extraction of precipitate statistics in an Al-Mg-Si-Cu alloy. Among the main findings is the precipitate number density, which agrees well with a conventional method based on manual counting and measurements. By virtue of its data analysis objectivity, our methodology is therefore seen as an advantageous alternative to existing routines, offering reproducibility and efficiency in alloy statistics. Additional results include improved qualitative information on phase distributions. The developed procedure is generic and applicable to any material containing nanoscale precipitates.
A voxel-based investigation for MRI-only radiotherapy of the brain using ultra short echo times
NASA Astrophysics Data System (ADS)
Edmund, Jens M.; Kjer, Hans M.; Van Leemput, Koen; Hansen, Rasmus H.; Andersen, Jon AL; Andreasen, Daniel
2014-12-01
Radiotherapy (RT) based on magnetic resonance imaging (MRI) as the only modality, so-called MRI-only RT, would remove the systematic registration error between MR and computed tomography (CT), and provide co-registered MRI for assessment of treatment response and adaptive RT. Electron densities, however, need to be assigned to the MRI images for dose calculation and patient setup based on digitally reconstructed radiographs (DRRs). Here, we investigate the geometric and dosimetric performance for a number of popular voxel-based methods to generate a so-called pseudo CT (pCT). Five patients receiving cranial irradiation, each containing a co-registered MRI and CT scan, were included. An ultra short echo time MRI sequence for bone visualization was used. Six methods were investigated for three popular types of voxel-based approaches; (1) threshold-based segmentation, (2) Bayesian segmentation and (3) statistical regression. Each approach contained two methods. Approach 1 used bulk density assignment of MRI voxels into air, soft tissue and bone based on logical masks and the transverse relaxation time T2 of the bone. Approach 2 used similar bulk density assignments with Bayesian statistics including or excluding additional spatial information. Approach 3 used a statistical regression correlating MRI voxels with their corresponding CT voxels. A similar photon and proton treatment plan was generated for a target positioned between the nasal cavity and the brainstem for all patients. The CT agreement with the pCT of each method was quantified and compared with the other methods geometrically and dosimetrically using both a number of reported metrics and introducing some novel metrics. The best geometrical agreement with CT was obtained with the statistical regression methods which performed significantly better than the threshold and Bayesian segmentation methods (excluding spatial information). All methods agreed significantly better with CT than a reference water MRI comparison. The mean dosimetric deviation for photons and protons compared to the CT was about 2% and highest in the gradient dose region of the brainstem. Both the threshold based method and the statistical regression methods showed the highest dosimetrical agreement. Generation of pCTs using statistical regression seems to be the most promising candidate for MRI-only RT of the brain. Further, the total amount of different tissues needs to be taken into account for dosimetric considerations regardless of their correct geometrical position.
Combining heuristic and statistical techniques in landslide hazard assessments
NASA Astrophysics Data System (ADS)
Cepeda, Jose; Schwendtner, Barbara; Quan, Byron; Nadim, Farrokh; Diaz, Manuel; Molina, Giovanni
2014-05-01
As a contribution to the Global Assessment Report 2013 - GAR2013, coordinated by the United Nations International Strategy for Disaster Reduction - UNISDR, a drill-down exercise for landslide hazard assessment was carried out by entering the results of both heuristic and statistical techniques into a new but simple combination rule. The data available for this evaluation included landslide inventories, both historical and event-based. In addition to the application of a heuristic method used in the previous editions of GAR, the availability of inventories motivated the use of statistical methods. The heuristic technique is largely based on the Mora & Vahrson method, which estimates hazard as the product of susceptibility and triggering factors, where classes are weighted based on expert judgment and experience. Two statistical methods were also applied: the landslide index method, which estimates weights of the classes for the susceptibility and triggering factors based on the evidence provided by the density of landslides in each class of the factors; and the weights of evidence method, which extends the previous technique to include both positive and negative evidence of landslide occurrence in the estimation of weights for the classes. One key aspect during the hazard evaluation was the decision on the methodology to be chosen for the final assessment. Instead of opting for a single methodology, it was decided to combine the results of the three implemented techniques using a combination rule based on a normalization of the results of each method. The hazard evaluation was performed for both earthquake- and rainfall-induced landslides. The country chosen for the drill-down exercise was El Salvador. The results indicate that highest hazard levels are concentrated along the central volcanic chain and at the centre of the northern mountains.
Lassere, Marissa N
2008-06-01
There are clear advantages to using biomarkers and surrogate endpoints, but concerns about clinical and statistical validity and systematic methods to evaluate these aspects hinder their efficient application. Section 2 is a systematic, historical review of the biomarker-surrogate endpoint literature with special reference to the nomenclature, the systems of classification and statistical methods developed for their evaluation. In Section 3 an explicit, criterion-based, quantitative, multidimensional hierarchical levels of evidence schema - Biomarker-Surrogacy Evaluation Schema - is proposed to evaluate and co-ordinate the multiple dimensions (biological, epidemiological, statistical, clinical trial and risk-benefit evidence) of the biomarker clinical endpoint relationships. The schema systematically evaluates and ranks the surrogacy status of biomarkers and surrogate endpoints using defined levels of evidence. The schema incorporates the three independent domains: Study Design, Target Outcome and Statistical Evaluation. Each domain has items ranked from zero to five. An additional category called Penalties incorporates additional considerations of biological plausibility, risk-benefit and generalizability. The total score (0-15) determines the level of evidence, with Level 1 the strongest and Level 5 the weakest. The term ;surrogate' is restricted to markers attaining Levels 1 or 2 only. Surrogacy status of markers can then be directly compared within and across different areas of medicine to guide individual, trial-based or drug-development decisions. This schema would facilitate communication between clinical, researcher, regulatory, industry and consumer participants necessary for evaluation of the biomarker-surrogate-clinical endpoint relationship in their different settings.
Hu, Xiangdong; Liu, Yujiang; Qian, Linxue
2017-01-01
Abstract Background: Real-time elastography (RTE) and shear wave elastography (SWE) are noninvasive and easily available imaging techniques that measure the tissue strain, and it has been reported that the sensitivity and the specificity of elastography were better in differentiating between benign and malignant thyroid nodules than conventional technologies. Methods: Relevant articles were searched in multiple databases; the comparison of elasticity index (EI) was conducted with the Review Manager 5.0. Forest plots of the sensitivity and specificity and SROC curve of RTE and SWE were performed with STATA 10.0 software. In addition, sensitivity analysis and bias analysis of the studies were conducted to examine the quality of articles; and to estimate possible publication bias, funnel plot was used and the Egger test was conducted. Results: Finally 22 articles which eventually satisfied the inclusion criteria were included in this study. After eliminating the inefficient, benign and malignant nodules were 2106 and 613, respectively. The meta-analysis suggested that the difference of EI between benign and malignant nodules was statistically significant (SMD = 2.11, 95% CI [1.67, 2.55], P < .00001). The overall sensitivities of RTE and SWE were roughly comparable, whereas the difference of specificities between these 2 methods was statistically significant. In addition, statistically significant difference of AUC between RTE and SWE was observed between RTE and SWE (P < .01). Conclusion: The specificity of RTE was statistically higher than that of SWE; which suggests that compared with SWE, RTE may be more accurate on differentiating benign and malignant thyroid nodules. PMID:29068996
Evaluation of normalization methods in mammalian microRNA-Seq data
Garmire, Lana Xia; Subramaniam, Shankar
2012-01-01
Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. PMID:22532701
Stucki, Sheldon Lee; Biss, David J.
2000-01-01
An analysis was performed using the National Automotive Sampling System Crashworthiness Data System (NASS-CDS) database to compare the injury/fatality rates of variously restrained driver occupants as compared to unrestrained driver occupants in the total database of drivers/frontals, and also by Delta-V. A structured search of the NASS-CDS was done using the SAS® statistical analysis software to extract the data for this analysis and the SUDAAN software package was used to arrive at statistical significance indicators. In addition, this paper goes on to investigate different methods for presenting results of accident database searches including significance results; a risk versus Delta-V format for specific exposures; and, a percent cumulative injury versus Delta-V format to characterize injury trends. These alternative analysis presentation methods are then discussed by example using the present study results. PMID:11558105
Stanley, Jeffrey R.; Adkins, Joshua N.; Slysz, Gordon W.; Monroe, Matthew E.; Purvine, Samuel O.; Karpievitch, Yuliya V.; Anderson, Gordon A.; Smith, Richard D.; Dabney, Alan R.
2011-01-01
Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, as this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referred to as Statistical Tools for AMT tag Confidence (STAC). STAC additionally provides a Uniqueness Probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download as both a command line and a Windows graphical application. PMID:21692516
Performance analysis of different tuning rules for an isothermal CSTR using integrated EPC and SPC
NASA Astrophysics Data System (ADS)
Roslan, A. H.; Karim, S. F. Abd; Hamzah, N.
2018-03-01
This paper demonstrates the integration of Engineering Process Control (EPC) and Statistical Process Control (SPC) for the control of product concentration of an isothermal CSTR. The objectives of this study are to evaluate the performance of Ziegler-Nichols (Z-N), Direct Synthesis, (DS) and Internal Model Control (IMC) tuning methods and determine the most effective method for this process. The simulation model was obtained from past literature and re-constructed using SIMULINK MATLAB to evaluate the process response. Additionally, the process stability, capability and normality were analyzed using Process Capability Sixpack reports in Minitab. Based on the results, DS displays the best response for having the smallest rise time, settling time, overshoot, undershoot, Integral Time Absolute Error (ITAE) and Integral Square Error (ISE). Also, based on statistical analysis, DS yields as the best tuning method as it exhibits the highest process stability and capability.
Stern, Hal S; Blower, Daniel; Cohen, Michael L; Czeisler, Charles A; Dinges, David F; Greenhouse, Joel B; Guo, Feng; Hanowski, Richard J; Hartenbaum, Natalie P; Krueger, Gerald P; Mallis, Melissa M; Pain, Richard F; Rizzo, Matthew; Sinha, Esha; Small, Dylan S; Stuart, Elizabeth A; Wegman, David H
2018-03-09
This article summarizes the recommendations on data and methodology issues for studying commercial motor vehicle driver fatigue of a National Academies of Sciences, Engineering, and Medicine study. A framework is provided that identifies the various factors affecting driver fatigue and relating driver fatigue to crash risk and long-term driver health. The relevant factors include characteristics of the driver, vehicle, carrier and environment. Limitations of existing data are considered and potential sources of additional data described. Statistical methods that can be used to improve understanding of the relevant relationships from observational data are also described. The recommendations for enhanced data collection and the use of modern statistical methods for causal inference have the potential to enhance our understanding of the relationship of fatigue to highway safety and to long-term driver health. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Multi-fidelity machine learning models for accurate bandgap predictions of solids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab
Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelitymore » quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.« less
Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation
NASA Technical Reports Server (NTRS)
Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.
2016-01-01
A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.
Multi-fidelity machine learning models for accurate bandgap predictions of solids
Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab
2016-12-28
Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelitymore » quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.« less
Tips and Tricks for Successful Application of Statistical Methods to Biological Data.
Schlenker, Evelyn
2016-01-01
This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.
A comparison of linear and nonlinear statistical techniques in performance attribution.
Chan, N H; Genovese, C R
2001-01-01
Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.
A Tool for Estimating Variability in Wood Preservative Treatment Retention
Patricia K. Lebow; Adam M. Taylor; Timothy M. Young
2015-01-01
Composite sampling is standard practice for evaluation of preservative retention levels in preservative-treated wood. Current protocols provide an average retention value but no estimate of uncertainty. Here we describe a statistical method for calculating uncertainty estimates using the standard sampling regime with minimal additional chemical analysis. This tool can...
Histogram deconvolution - An aid to automated classifiers
NASA Technical Reports Server (NTRS)
Lorre, J. J.
1983-01-01
It is shown that N-dimensional histograms are convolved by the addition of noise in the picture domain. Three methods are described which provide the ability to deconvolve such noise-affected histograms. The purpose of the deconvolution is to provide automated classifiers with a higher quality N-dimensional histogram from which to obtain classification statistics.
NASA Technical Reports Server (NTRS)
Djorgovski, George
1993-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
NASA Technical Reports Server (NTRS)
Djorgovski, Stanislav
1992-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
NASA Astrophysics Data System (ADS)
Jambrina, P. G.; Lara, Manuel; Menéndez, M.; Launay, J.-M.; Aoiz, F. J.
2012-10-01
Cumulative reaction probabilities (CRPs) at various total angular momenta have been calculated for the barrierless reaction S(1D) + H2 → SH + H at total energies up to 1.2 eV using three different theoretical approaches: time-independent quantum mechanics (QM), quasiclassical trajectories (QCT), and statistical quasiclassical trajectories (SQCT). The calculations have been carried out on the widely used potential energy surface (PES) by Ho et al. [J. Chem. Phys. 116, 4124 (2002), 10.1063/1.1431280] as well as on the recent PES developed by Song et al. [J. Phys. Chem. A 113, 9213 (2009), 10.1021/jp903790h]. The results show that the differences between these two PES are relatively minor and mostly related to the different topologies of the well. In addition, the agreement between the three theoretical methodologies is good, even for the highest total angular momenta and energies. In particular, the good accordance between the CRPs obtained with dynamical methods (QM and QCT) and the statistical model (SQCT) indicates that the reaction can be considered statistical in the whole range of energies in contrast with the findings for other prototypical barrierless reactions. In addition, total CRPs and rate coefficients in the range of 20-1000 K have been calculated using the QCT and SQCT methods and have been found somewhat smaller than the experimental total removal rates of S(1D).
NASA Astrophysics Data System (ADS)
Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.
2013-10-01
Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.
CORSSA: Community Online Resource for Statistical Seismicity Analysis
NASA Astrophysics Data System (ADS)
Zechar, J. D.; Hardebeck, J. L.; Michael, A. J.; Naylor, M.; Steacy, S.; Wiemer, S.; Zhuang, J.
2011-12-01
Statistical seismology is critical to the understanding of seismicity, the evaluation of proposed earthquake prediction and forecasting methods, and the assessment of seismic hazard. Unfortunately, despite its importance to seismology-especially to those aspects with great impact on public policy-statistical seismology is mostly ignored in the education of seismologists, and there is no central repository for the existing open-source software tools. To remedy these deficiencies, and with the broader goal to enhance the quality of statistical seismology research, we have begun building the Community Online Resource for Statistical Seismicity Analysis (CORSSA, www.corssa.org). We anticipate that the users of CORSSA will range from beginning graduate students to experienced researchers. More than 20 scientists from around the world met for a week in Zurich in May 2010 to kick-start the creation of CORSSA: the format and initial table of contents were defined; a governing structure was organized; and workshop participants began drafting articles. CORSSA materials are organized with respect to six themes, each will contain between four and eight articles. CORSSA now includes seven articles with an additional six in draft form along with forums for discussion, a glossary, and news about upcoming meetings, special issues, and recent papers. Each article is peer-reviewed and presents a balanced discussion, including illustrative examples and code snippets. Topics in the initial set of articles include: introductions to both CORSSA and statistical seismology, basic statistical tests and their role in seismology; understanding seismicity catalogs and their problems; basic techniques for modeling seismicity; and methods for testing earthquake predictability hypotheses. We have also begun curating a collection of statistical seismology software packages.
Statistical image-domain multimaterial decomposition for dual-energy CT.
Xue, Yi; Ruan, Ruoshui; Hu, Xiuhua; Kuang, Yu; Wang, Jing; Long, Yong; Niu, Tianye
2017-03-01
Dual-energy CT (DECT) enhances tissue characterization because of its basis material decomposition capability. In addition to conventional two-material decomposition from DECT measurements, multimaterial decomposition (MMD) is required in many clinical applications. To solve the ill-posed problem of reconstructing multi-material images from dual-energy measurements, additional constraints are incorporated into the formulation, including volume and mass conservation and the assumptions that there are at most three materials in each pixel and various material types among pixels. The recently proposed flexible image-domain MMD method decomposes pixels sequentially into multiple basis materials using a direct inversion scheme which leads to magnified noise in the material images. In this paper, we propose a statistical image-domain MMD method for DECT to suppress the noise. The proposed method applies penalized weighted least-square (PWLS) reconstruction with a negative log-likelihood term and edge-preserving regularization for each material. The statistical weight is determined by a data-based method accounting for the noise variance of high- and low-energy CT images. We apply the optimization transfer principles to design a serial of pixel-wise separable quadratic surrogates (PWSQS) functions which monotonically decrease the cost function. The separability in each pixel enables the simultaneous update of all pixels. The proposed method is evaluated on a digital phantom, Catphan©600 phantom and three patients (pelvis, head, and thigh). We also implement the direct inversion and low-pass filtration methods for a comparison purpose. Compared with the direct inversion method, the proposed method reduces noise standard deviation (STD) in soft tissue by 95.35% in the digital phantom study, by 88.01% in the Catphan©600 phantom study, by 92.45% in the pelvis patient study, by 60.21% in the head patient study, and by 81.22% in the thigh patient study, respectively. The overall volume fraction accuracy is improved by around 6.85%. Compared with the low-pass filtration method, the root-mean-square percentage error (RMSE(%)) of electron densities in the Catphan©600 phantom is decreased by 20.89%. As modulation transfer function (MTF) magnitude decreased to 50%, the proposed method increases the spatial resolution by an overall factor of 1.64 on the digital phantom, and 2.16 on the Catphan©600 phantom. The overall volume fraction accuracy is increased by 6.15%. We proposed a statistical image-domain MMD method using DECT measurements. The method successfully suppresses the magnified noise while faithfully retaining the quantification accuracy and anatomical structure in the decomposed material images. The proposed method is practical and promising for advanced clinical applications using DECT imaging. © 2017 American Association of Physicists in Medicine.
NASA Astrophysics Data System (ADS)
Merey, Hanan A.; El-Mosallamy, Sally S.; Hassan, Nagiba Y.; El-Zeany, Badr A.
2016-05-01
Fluticasone propionate (FLU) and Azelastine hydrochloride (AZE) are co-formulated with phenylethyl alcohol (PEA) and Benzalkonium chloride (BENZ) (as preservatives) in pharmaceutical dosage form for treatment of seasonal allergies. Different spectrophotometric methods were used for the simultaneous determination of cited drugs in the dosage form. Direct spectrophotometric method was used for determining of AZE, while Derivative of double divisor of ratio spectra (DD-RS), Ratio subtraction coupled with ratio difference method (RS-RD) and Mean centering of the ratio spectra (MCR) are used for the determination of FLU. The linearity of the proposed methods was investigated in the range of 5.00-40.00 and 5.00-80.00 μg/mL for FLU and AZE, respectively. The specificity of the developed methods was investigated by analyzing laboratory prepared mixtures containing different ratios of cited drugs in addition to PEA and their pharmaceutical dosage form. The validity of the proposed methods was assessed using the standard addition technique. The obtained results were statistically compared with those obtained by official or the reported method for FLU or AZE, respectively showing no significant difference with respect to accuracy and precision at p = 0.05.
Jia, Erik; Chen, Tianlu
2018-01-01
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
NASA Astrophysics Data System (ADS)
Dumedah, Gift; Walker, Jeffrey P.; Chik, Li
2014-07-01
Soil moisture information is critically important for water management operations including flood forecasting, drought monitoring, and groundwater recharge estimation. While an accurate and continuous record of soil moisture is required for these applications, the available soil moisture data, in practice, is typically fraught with missing values. There are a wide range of methods available to infilling hydrologic variables, but a thorough inter-comparison between statistical methods and artificial neural networks has not been made. This study examines 5 statistical methods including monthly averages, weighted Pearson correlation coefficient, a method based on temporal stability of soil moisture, and a weighted merging of the three methods, together with a method based on the concept of rough sets. Additionally, 9 artificial neural networks are examined, broadly categorized into feedforward, dynamic, and radial basis networks. These 14 infilling methods were used to estimate missing soil moisture records and subsequently validated against known values for 13 soil moisture monitoring stations for three different soil layer depths in the Yanco region in southeast Australia. The evaluation results show that the top three highest performing methods are the nonlinear autoregressive neural network, rough sets method, and monthly replacement. A high estimation accuracy (root mean square error (RMSE) of about 0.03 m/m) was found in the nonlinear autoregressive network, due to its regression based dynamic network which allows feedback connections through discrete-time estimation. An equally high accuracy (0.05 m/m RMSE) in the rough sets procedure illustrates the important role of temporal persistence of soil moisture, with the capability to account for different soil moisture conditions.
Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling.
Wu, Hulin; Lu, Tao; Xue, Hongqi; Liang, Hua
2014-04-02
The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.
Efficient Blockwise Permutation Tests Preserving Exchangeability
Zhou, Chunxiao; Zwilling, Chris E.; Calhoun, Vince D.; Wang, Michelle Y.
2014-01-01
In this paper, we present a new blockwise permutation test approach based on the moments of the test statistic. The method is of importance to neuroimaging studies. In order to preserve the exchangeability condition required in permutation tests, we divide the entire set of data into certain exchangeability blocks. In addition, computationally efficient moments-based permutation tests are performed by approximating the permutation distribution of the test statistic with the Pearson distribution series. This involves the calculation of the first four moments of the permutation distribution within each block and then over the entire set of data. The accuracy and efficiency of the proposed method are demonstrated through simulated experiment on the magnetic resonance imaging (MRI) brain data, specifically the multi-site voxel-based morphometry analysis from structural MRI (sMRI). PMID:25289113
Hensman, James; Lawrence, Neil D; Rattray, Magnus
2013-08-20
Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.
Results of Propellant Mixing Variable Study Using Precise Pressure-Based Burn Rate Calculations
NASA Technical Reports Server (NTRS)
Stefanski, Philip L.
2014-01-01
A designed experiment was conducted in which three mix processing variables (pre-curative addition mix temperature, pre-curative addition mixing time, and mixer speed) were varied to estimate their effects on within-mix propellant burn rate variability. The chosen discriminator for the experiment was the 2-inch diameter by 4-inch long (2x4) Center-Perforated (CP) ballistic evaluation motor. Motor nozzle throat diameters were sized to produce a common targeted chamber pressure. Initial data analysis did not show a statistically significant effect. Because propellant burn rate must be directly related to chamber pressure, a method was developed that showed statistically significant effects on chamber pressure (either maximum or average) by adjustments to the process settings. Burn rates were calculated from chamber pressures and these were then normalized to a common pressure for comparative purposes. The pressure-based method of burn rate determination showed significant reduction in error when compared to results obtained from the Brooks' modification of the propellant web-bisector burn rate determination method. Analysis of effects using burn rates calculated by the pressure-based method showed a significant correlation of within-mix burn rate dispersion to mixing duration and the quadratic of mixing duration. The findings were confirmed in a series of mixes that examined the effects of mixing time on burn rate variation, which yielded the same results.
Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael
2017-11-01
Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Exploring the use of statistical process control methods to assess course changes
NASA Astrophysics Data System (ADS)
Vollstedt, Ann-Marie
This dissertation pertains to the field of Engineering Education. The Department of Mechanical Engineering at the University of Nevada, Reno (UNR) is hosting this dissertation under a special agreement. This study was motivated by the desire to find an improved, quantitative measure of student quality that is both convenient to use and easy to evaluate. While traditional statistical analysis tools such as ANOVA (analysis of variance) are useful, they are somewhat time consuming and are subject to error because they are based on grades, which are influenced by numerous variables, independent of student ability and effort (e.g. inflation and curving). Additionally, grades are currently the only measure of quality in most engineering courses even though most faculty agree that grades do not accurately reflect student quality. Based on a literature search, in this study, quality was defined as content knowledge, cognitive level, self efficacy, and critical thinking. Nineteen treatments were applied to a pair of freshmen classes in an effort in increase the qualities. The qualities were measured via quiz grades, essays, surveys, and online critical thinking tests. Results from the quality tests were adjusted and filtered prior to analysis. All test results were subjected to Chauvenet's criterion in order to detect and remove outlying data. In addition to removing outliers from data sets, it was felt that individual course grades needed adjustment to accommodate for the large portion of the grade that was defined by group work. A new method was developed to adjust grades within each group based on the residual of the individual grades within the group and the portion of the course grade defined by group work. It was found that the grade adjustment method agreed 78% of the time with the manual ii grade changes instructors made in 2009, and also increased the correlation between group grades and individual grades. Using these adjusted grades, Statistical Process Control (SPC) methods were employed to evaluate the impact of the treatments applied to improve the courses. It was determined that using SPC is advantageous because it does not require additional resources and is not affected if a course is curved by adding the same amount of points to each student's grade. It was also determined that SPC results, unlike average grade, correlated well with anecdotal evidence from the instructors concerning how well the students performed in any given year. In addition to application of SPC to evaluate curriculum change, statistical analysis was used to show that course grades correlate with quiz grades, but do not correlate with critical thinking, self efficacy, or cognitive level which implies that treatments need to be implemented to increase these qualities.
Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.
2014-01-01
Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289
NASA Astrophysics Data System (ADS)
Grimm, G. W.; Potts, A. J.
2015-12-01
The Coexistence Approach has been used infer palaeoclimates for many Eurasian fossil plant assemblage. However, the theory that underpins the method has never been examined in detail. Here we discuss acknowledged and implicit assumptions, and assess the statistical nature and pseudo-logic of the method. We also compare the Coexistence Approach theory with the active field of species distribution modelling. We argue that the assumptions will inevitably be violated to some degree and that the method has no means to identify and quantify these violations. The lack of a statistical framework makes the method highly vulnerable to the vagaries of statistical outliers and exotic elements. In addition, we find numerous logical inconsistencies, such as how climate shifts are quantified (the use of a "center value" of a coexistence interval) and the ability to reconstruct "extinct" climates from modern plant distributions. Given the problems that have surfaced in species distribution modelling, accurate and precise quantitative reconstructions of palaeoclimates (or even climate shifts) using the nearest-living-relative principle and rectilinear niches (the basis of the method) will not be possible. The Coexistence Approach can be summarised as an exercise that shoe-horns a plant fossil assemblages into coexistence and then naively assumes that this must be the climate. Given the theoretical issues, and methodological issues highlighted elsewhere, we suggest that the method be discontinued and that all past reconstructions be disregarded and revisited using less fallacious methods.
NASA Astrophysics Data System (ADS)
Qi, D.; Majda, A.
2017-12-01
A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.
Confidence Intervals from Realizations of Simulated Nuclear Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Younes, W.; Ratkiewicz, A.; Ressler, J. J.
2017-09-28
Various statistical techniques are discussed that can be used to assign a level of confidence in the prediction of models that depend on input data with known uncertainties and correlations. The particular techniques reviewed in this paper are: 1) random realizations of the input data using Monte-Carlo methods, 2) the construction of confidence intervals to assess the reliability of model predictions, and 3) resampling techniques to impose statistical constraints on the input data based on additional information. These techniques are illustrated with a calculation of the keff value, based on the 235U(n, f) and 239Pu (n, f) cross sections.
Parameter Estimation and Model Validation of Nonlinear Dynamical Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abarbanel, Henry; Gill, Philip
In the performance period of this work under a DOE contract, the co-PIs, Philip Gill and Henry Abarbanel, developed new methods for statistical data assimilation for problems of DOE interest, including geophysical and biological problems. This included numerical optimization algorithms for variational principles, new parallel processing Monte Carlo routines for performing the path integrals of statistical data assimilation. These results have been summarized in the monograph: “Predicting the Future: Completing Models of Observed Complex Systems” by Henry Abarbanel, published by Spring-Verlag in June 2013. Additional results and details have appeared in the peer reviewed literature.
“Magnitude-based Inference”: A Statistical Review
Welsh, Alan H.; Knight, Emma J.
2015-01-01
ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387
Miles, Jeffrey Hilton
2011-05-01
Combustion noise from turbofan engines has become important, as the noise from sources like the fan and jet are reduced. An aligned and un-aligned coherence technique has been developed to determine a threshold level for the coherence and thereby help to separate the coherent combustion noise source from other noise sources measured with far-field microphones. This method is compared with a statistics based coherence threshold estimation method. In addition, the un-aligned coherence procedure at the same time also reveals periodicities, spectral lines, and undamped sinusoids hidden by broadband turbofan engine noise. In calculating the coherence threshold using a statistical method, one may use either the number of independent records or a larger number corresponding to the number of overlapped records used to create the average. Using data from a turbofan engine and a simulation this paper shows that applying the Fisher z-transform to the un-aligned coherence can aid in making the proper selection of samples and produce a reasonable statistics based coherence threshold. Examples are presented showing that the underlying tonal and coherent broad band structure which is buried under random broadband noise and jet noise can be determined. The method also shows the possible presence of indirect combustion noise.
Bringing Clouds into Our Lab! - The Influence of Turbulence on the Early Stage Rain Droplets
NASA Astrophysics Data System (ADS)
Yavuz, Mehmet Altug; Kunnen, Rudie; Heijst, Gertjan; Clercx, Herman
2015-11-01
We are investigating a droplet-laden flow in an air-filled turbulence chamber, forced by speaker-driven air jets. The speakers are running in a random manner; yet they allow us to control and define the statistics of the turbulence. We study the motion of droplets with tunable size (Stokes numbers ~ 0.13 - 9) in a turbulent flow, mimicking the early stages of raindrop formation. 3D Particle Tracking Velocimetry (PTV) together with Laser Induced Fluorescence (LIF) methods are chosen as the experimental method to track the droplets and collect data for statistical analysis. Thereby it is possible to study the spatial distribution of the droplets in turbulence using the so-called Radial Distribution Function (RDF), a statistical measure to quantify the clustering of particles. Additionally, 3D-PTV technique allows us to measure velocity statistics of the droplets and the influence of the turbulence on droplet trajectories, both individually and collectively. In this contribution, we will present the clustering probability quantified by the RDF for different Stokes numbers. We will explain the physics underlying the influence of turbulence on droplet cluster behavior. This study supported by FOM/NWO Netherlands.
Basic statistics (the fundamental concepts).
Lim, Eric
2014-12-01
An appreciation and understanding of statistics is import to all practising clinicians, not simply researchers. This is because mathematics is the fundamental basis to which we base clinical decisions, usually with reference to the benefit in relation to risk. Unless a clinician has a basic understanding of statistics, he or she will never be in a position to question healthcare management decisions that have been handed down from generation to generation, will not be able to conduct research effectively nor evaluate the validity of published evidence (usually making an assumption that most published work is either all good or all bad). This article provides a brief introduction to basic statistical methods and illustrates its use in common clinical scenarios. In addition, pitfalls of incorrect usage have been highlighted. However, it is not meant to be a substitute for formal training or consultation with a qualified and experienced medical statistician prior to starting any research project.
Lotfy, Hayam M; Hegazy, Maha A; Rezk, Mamdouh R; Omran, Yasmin Rostom
2015-09-05
Smart spectrophotometric methods have been applied and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and prednisolone acetate (PA) without preliminary separation. Two novel methods have been developed; the first method depends upon advanced absorbance subtraction (AAS), while the other method relies on advanced amplitude modulation (AAM); in addition to the well established dual wavelength (DW), ratio difference (RD) and constant center coupled with spectrum subtraction (CC-SS) methods. Accuracy, precision and linearity ranges of these methods were determined. Moreover, selectivity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied to the assay of drugs in their pharmaceutical formulations. No interference was observed from common additives and the validity of the methods was tested. The obtained results have been statistically compared to that of official spectrophotometric methods to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.
Robustly detecting differential expression in RNA sequencing data using observation weights
Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.
2014-01-01
A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lapuyade-Lahorgue, J; Ruan, S; Li, H
Purpose: Multi-tracer PET imaging is getting more attention in radiotherapy by providing additional tumor volume information such as glucose and oxygenation. However, automatic PET-based tumor segmentation is still a very challenging problem. We propose a statistical fusion approach to joint segment the sub-area of tumors from the two tracers FDG and FMISO PET images. Methods: Non-standardized Gamma distributions are convenient to model intensity distributions in PET. As a serious correlation exists in multi-tracer PET images, we proposed a new fusion method based on copula which is capable to represent dependency between different tracers. The Hidden Markov Field (HMF) model ismore » used to represent spatial relationship between PET image voxels and statistical dynamics of intensities for each modality. Real PET images of five patients with FDG and FMISO are used to evaluate quantitatively and qualitatively our method. A comparison between individual and multi-tracer segmentations was conducted to show advantages of the proposed fusion method. Results: The segmentation results show that fusion with Gaussian copula can receive high Dice coefficient of 0.84 compared to that of 0.54 and 0.3 of monomodal segmentation results based on individual segmentation of FDG and FMISO PET images. In addition, high correlation coefficients (0.75 to 0.91) for the Gaussian copula for all five testing patients indicates the dependency between tumor regions in the multi-tracer PET images. Conclusion: This study shows that using multi-tracer PET imaging can efficiently improve the segmentation of tumor region where hypoxia and glucidic consumption are present at the same time. Introduction of copulas for modeling the dependency between two tracers can simultaneously take into account information from both tracers and deal with two pathological phenomena. Future work will be to consider other families of copula such as spherical and archimedian copulas, and to eliminate partial volume effect by considering dependency between neighboring voxels.« less
A novel bi-level meta-analysis approach: applied to biological pathway analysis.
Nguyen, Tin; Tagett, Rebecca; Donato, Michele; Mitrea, Cristina; Draghici, Sorin
2016-02-01
The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. The R scripts are available on demand from the authors. sorin@wayne.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan
2015-01-08
Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Statistical issues on the analysis of change in follow-up studies in dental research.
Blance, Andrew; Tu, Yu-Kang; Baelum, Vibeke; Gilthorpe, Mark S
2007-12-01
To provide an overview to the problems in study design and associated analyses of follow-up studies in dental research, particularly addressing three issues: treatment-baselineinteractions; statistical power; and nonrandomization. Our previous work has shown that many studies purport an interacion between change (from baseline) and baseline values, which is often based on inappropriate statistical analyses. A priori power calculations are essential for randomized controlled trials (RCTs), but in the pre-test/post-test RCT design it is not well known to dental researchers that the choice of statistical method affects power, and that power is affected by treatment-baseline interactions. A common (good) practice in the analysis of RCT data is to adjust for baseline outcome values using ancova, thereby increasing statistical power. However, an important requirement for ancova is there to be no interaction between the groups and baseline outcome (i.e. effective randomization); the patient-selection process should not cause differences in mean baseline values across groups. This assumption is often violated for nonrandomized (observational) studies and the use of ancova is thus problematic, potentially giving biased estimates, invoking Lord's paradox and leading to difficulties in the interpretation of results. Baseline interaction issues can be overcome by use of statistical methods; not widely practiced in dental research: Oldham's method and multilevel modelling; the latter is preferred for its greater flexibility to deal with more than one follow-up occasion as well as additional covariates To illustrate these three key issues, hypothetical examples are considered from the fields of periodontology, orthodontics, and oral implantology. Caution needs to be exercised when considering the design and analysis of follow-up studies. ancova is generally inappropriate for nonrandomized studies and causal inferences from observational data should be avoided.
NASA Astrophysics Data System (ADS)
Grimm, Guido W.; Potts, Alastair J.
2016-03-01
The Coexistence Approach has been used to infer palaeoclimates for many Eurasian fossil plant assemblages. However, the theory that underpins the method has never been examined in detail. Here we discuss acknowledged and implicit assumptions and assess the statistical nature and pseudo-logic of the method. We also compare the Coexistence Approach theory with the active field of species distribution modelling. We argue that the assumptions will inevitably be violated to some degree and that the method lacks any substantive means to identify or quantify these violations. The absence of a statistical framework makes the method highly vulnerable to the vagaries of statistical outliers and exotic elements. In addition, we find numerous logical inconsistencies, such as how climate shifts are quantified (the use of a "centre value" of a coexistence interval) and the ability to reconstruct "extinct" climates from modern plant distributions. Given the problems that have surfaced in species distribution modelling, accurate and precise quantitative reconstructions of palaeoclimates (or even climate shifts) using the nearest-living-relative principle and rectilinear niches (the basis of the method) will not be possible. The Coexistence Approach can be summarised as an exercise that shoehorns a plant fossil assemblage into coexistence and then assumes that this must be the climate. Given the theoretical issues and methodological issues highlighted elsewhere, we suggest that the method be discontinued and that all past reconstructions be disregarded and revisited using less fallacious methods. We outline six steps for (further) validation of available and future taxon-based methods and advocate developing (semi-quantitative) methods that prioritise robustness over precision.
Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.
Kwak, Il-Youp; Pan, Wei
2017-01-01
To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
Ries, Kernell G.; Eng, Ken
2010-01-01
The U.S. Geological Survey, in cooperation with the Maryland Department of the Environment, operated a network of 20 low-flow partial-record stations during 2008 in a region that extends from southwest of Baltimore to the northeastern corner of Maryland to obtain estimates of selected streamflow statistics at the station locations. The study area is expected to face a substantial influx of new residents and businesses as a result of military and civilian personnel transfers associated with the Federal Base Realignment and Closure Act of 2005. The estimated streamflow statistics, which include monthly 85-percent duration flows, the 10-year recurrence-interval minimum base flow, and the 7-day, 10-year low flow, are needed to provide a better understanding of the availability of water resources in the area to be affected by base-realignment activities. Streamflow measurements collected for this study at the low-flow partial-record stations and measurements collected previously for 8 of the 20 stations were related to concurrent daily flows at nearby index streamgages to estimate the streamflow statistics. Three methods were used to estimate the streamflow statistics and two methods were used to select the index streamgages. Of the three methods used to estimate the streamflow statistics, two of them--the Moments and MOVE1 methods--rely on correlating the streamflow measurements at the low-flow partial-record stations with concurrent streamflows at nearby, hydrologically similar index streamgages to determine the estimates. These methods, recommended for use by the U.S. Geological Survey, generally require about 10 streamflow measurements at the low-flow partial-record station. The third method transfers the streamflow statistics from the index streamgage to the partial-record station based on the average of the ratios of the measured streamflows at the partial-record station to the concurrent streamflows at the index streamgage. This method can be used with as few as one pair of streamflow measurements made on a single streamflow recession at the low-flow partial-record station, although additional pairs of measurements will increase the accuracy of the estimates. Errors associated with the two correlation methods generally were lower than the errors associated with the flow-ratio method, but the advantages of the flow-ratio method are that it can produce reasonably accurate estimates from streamflow measurements much faster and at lower cost than estimates obtained using the correlation methods. The two index-streamgage selection methods were (1) selection based on the highest correlation coefficient between the low-flow partial-record station and the index streamgages, and (2) selection based on Euclidean distance, where the Euclidean distance was computed as a function of geographic proximity and the basin characteristics: drainage area, percentage of forested area, percentage of impervious area, and the base-flow recession time constant, t. Method 1 generally selected index streamgages that were significantly closer to the low-flow partial-record stations than method 2. The errors associated with the estimated streamflow statistics generally were lower for method 1 than for method 2, but the differences were not statistically significant. The flow-ratio method for estimating streamflow statistics at low-flow partial-record stations was shown to be independent from the two correlation-based estimation methods. As a result, final estimates were determined for eight low-flow partial-record stations by weighting estimates from the flow-ratio method with estimates from one of the two correlation methods according to the respective variances of the estimates. Average standard errors of estimate for the final estimates ranged from 90.0 to 7.0 percent, with an average value of 26.5 percent. Average standard errors of estimate for the weighted estimates were, on average, 4.3 percent less than the best average standard errors of estima
"Magnitude-based inference": a statistical review.
Welsh, Alan H; Knight, Emma J
2015-04-01
We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.
Horng, Huann-Cheng; Yuan, Chiou-Chung; Chao, Kuan-Chong; Cheng, Ming-Huei; Wang, Peng-Hui
2007-06-01
To evaluate the efficacy and acceptability of the Port-A-Cath (PAC) insertion method with (conventional group as II) and without (modified group as I) the aid of intraoperative fluoroscopy or other localizing devices. A total of 158 women with various kinds of gynecological cancers warranting PAC insertion (n = 86 in group I and n = 72 in group II, respectively) were evaluated. Data for analyses included patient age, main disease, dislocation site, surgical time, complications, and catheter outcome. There was no statistical difference between the two groups in terms of age, main disease, complications, and the experiencing of patent catheters. However, appropriate positioning (100% in group I, and 82% in group II) in the superior vena cava (SVC) showed statistical differences between the two groups (P = 0.001). In addition, the surgical time in group I was statistically shorter than that in group II (P < 0.001). The modified method for inserting the PAC offered the following benefits: including avoiding X-ray exposure for both the operator and the patient, defining the appropriate position in the SVC, and less surgical time. (c) 2007 Wiley-Liss, Inc.
Additive scales in degenerative disease--calculation of effect sizes and clinical judgment.
Riepe, Matthias W; Wilkinson, David; Förstl, Hans; Brieden, Andreas
2011-12-16
The therapeutic efficacy of an intervention is often assessed in clinical trials by scales measuring multiple diverse activities that are added to produce a cumulative global score. Medical communities and health care systems subsequently use these data to calculate pooled effect sizes to compare treatments. This is done because major doubt has been cast over the clinical relevance of statistically significant findings relying on p values with the potential to report chance findings. Hence in an aim to overcome this pooling the results of clinical studies into a meta-analyses with a statistical calculus has been assumed to be a more definitive way of deciding of efficacy. We simulate the therapeutic effects as measured with additive scales in patient cohorts with different disease severity and assess the limitations of an effect size calculation of additive scales which are proven mathematically. We demonstrate that the major problem, which cannot be overcome by current numerical methods, is the complex nature and neurobiological foundation of clinical psychiatric endpoints in particular and additive scales in general. This is particularly relevant for endpoints used in dementia research. 'Cognition' is composed of functions such as memory, attention, orientation and many more. These individual functions decline in varied and non-linear ways. Here we demonstrate that with progressive diseases cumulative values from multidimensional scales are subject to distortion by the limitations of the additive scale. The non-linearity of the decline of function impedes the calculation of effect sizes based on cumulative values from these multidimensional scales. Statistical analysis needs to be guided by boundaries of the biological condition. Alternatively, we suggest a different approach avoiding the error imposed by over-analysis of cumulative global scores from additive scales.
Kamal, Ghulam Mustafa; Wang, Xiaohua; Bin Yuan; Wang, Jie; Sun, Peng; Zhang, Xu; Liu, Maili
2016-09-01
Soy sauce a well known seasoning all over the world, especially in Asia, is available in global market in a wide range of types based on its purpose and the processing methods. Its composition varies with respect to the fermentation processes and addition of additives, preservatives and flavor enhancers. A comprehensive (1)H NMR based study regarding the metabonomic variations of soy sauce to differentiate among different types of soy sauce available on the global market has been limited due to the complexity of the mixture. In present study, (13)C NMR spectroscopy coupled with multivariate statistical data analysis like principle component analysis (PCA), and orthogonal partial least square-discriminant analysis (OPLS-DA) was applied to investigate metabonomic variations among different types of soy sauce, namely super light, super dark, red cooking and mushroom soy sauce. The main additives in soy sauce like glutamate, sucrose and glucose were easily distinguished and quantified using (13)C NMR spectroscopy which were otherwise difficult to be assigned and quantified due to serious signal overlaps in (1)H NMR spectra. The significantly higher concentration of sucrose in dark, red cooking and mushroom flavored soy sauce can directly be linked to the addition of caramel in soy sauce. Similarly, significantly higher level of glutamate in super light as compared to super dark and mushroom flavored soy sauce may come from the addition of monosodium glutamate. The study highlights the potentiality of (13)C NMR based metabonomics coupled with multivariate statistical data analysis in differentiating between the types of soy sauce on the basis of level of additives, raw materials and fermentation procedures. Copyright © 2016 Elsevier B.V. All rights reserved.
Aminocarminic acid in E120-labelled food additives and beverages.
Sabatino, Leonardo; Scordino, Monica; Gargano, Maria; Lazzaro, Francesco; Borzì, Marco A; Traulo, Pasqualino; Gagliano, Giacomo
2012-01-01
An analytical method was developed for investigating aminocarminic acid occurrence in E120-labelled red-coloured-beverages and in E120 additives, with the aim of controlling the purity of the carmine additive in countries where the use of aminocarminic acid is forbidden. The carminic acid and the aminocarminic acid were separated by high-performance liquid chromatography-photodiode array-tandem mass spectrography (HPLC-PDA-MS/MS). The method was statistically validated. The regression lines, ranging from 10 to 100 mg/L, showed r(2 )> 0.9996. Recoveries from 97% to 101% were obtained for the fortification level of 50 mg/L; the relative standard deviations did not exceed 3%. The LODs were below 2 mg/L, whereas the LOQs did not exceed 4 mg/L. The method was successfully applied to 27 samples of commercial E120-labelled red-coloured beverages and E120 additives, collected in Italy during quality control investigations conducted by the Ministry. The results demonstrated that more than 50% of the samples contained aminocarminic acid, evidencing the alarming illicit use of this semi-synthetic carmine acid derivative.
Witches, History, and Microcomputers: A Computer-Assisted Course on the Salem Witch Trials.
ERIC Educational Resources Information Center
Latner, Richard B.
1988-01-01
Describes the addition of a microcomputer component to a Tulane University (Louisiana) undergraduate history course on the Salem witchcraft trials. Discusses the use of a statistical package and a data set to analyze and display data and the enhancement of the active learning approach by introducing students to quantitative methods of historical…
NASA Astrophysics Data System (ADS)
Chavez, Roberto; Lozano, Sergio; Correia, Pedro; Sanz-Rodrigo, Javier; Probst, Oliver
2013-04-01
With the purpose of efficiently and reliably generating long-term wind resource maps for the wind energy industry, the application and verification of a statistical methodology for the climate downscaling of wind fields at surface level is presented in this work. This procedure is based on the combination of the Monte Carlo and the Principal Component Analysis (PCA) statistical methods. Firstly the Monte Carlo method is used to create a huge number of daily-based annual time series, so called climate representative years, by the stratified sampling of a 33-year-long time series corresponding to the available period of the NCAR/NCEP global reanalysis data set (R-2). Secondly the representative years are evaluated such that the best set is chosen according to its capability to recreate the Sea Level Pressure (SLP) temporal and spatial fields from the R-2 data set. The measure of this correspondence is based on the Euclidean distance between the Empirical Orthogonal Functions (EOF) spaces generated by the PCA (Principal Component Analysis) decomposition of the SLP fields from both the long-term and the representative year data sets. The methodology was verified by comparing the selected 365-days period against a 9-year period of wind fields generated by dynamical downscaling the Global Forecast System data with the mesoscale model SKIRON for the Iberian Peninsula. These results showed that, compared to the traditional method of dynamical downscaling any random 365-days period, the error in the average wind velocity by the PCA's representative year was reduced by almost 30%. Moreover the Mean Absolute Errors (MAE) in the monthly and daily wind profiles were also reduced by almost 25% along all SKIRON grid points. These results showed also that the methodology presented maximum error values in the wind speed mean of 0.8 m/s and maximum MAE in the monthly curves of 0.7 m/s. Besides the bulk numbers, this work shows the spatial distribution of the errors across the Iberian domain and additional wind statistics such as the velocity and directional frequency. Additional repetitions were performed to prove the reliability and robustness of this kind-of statistical-dynamical downscaling method.
An advanced probabilistic structural analysis method for implicit performance functions
NASA Technical Reports Server (NTRS)
Wu, Y.-T.; Millwater, H. R.; Cruse, T. A.
1989-01-01
In probabilistic structural analysis, the performance or response functions usually are implicitly defined and must be solved by numerical analysis methods such as finite element methods. In such cases, the most commonly used probabilistic analysis tool is the mean-based, second-moment method which provides only the first two statistical moments. This paper presents a generalized advanced mean value (AMV) method which is capable of establishing the distributions to provide additional information for reliability design. The method requires slightly more computations than the second-moment method but is highly efficient relative to the other alternative methods. In particular, the examples show that the AMV method can be used to solve problems involving non-monotonic functions that result in truncated distributions.
Han, Kyunghwa; Jung, Inkyung
2018-05-01
This review article presents an assessment of trends in statistical methods and an evaluation of their appropriateness in articles published in the Archives of Plastic Surgery (APS) from 2012 to 2017. We reviewed 388 original articles published in APS between 2012 and 2017. We categorized the articles that used statistical methods according to the type of statistical method, the number of statistical methods, and the type of statistical software used. We checked whether there were errors in the description of statistical methods and results. A total of 230 articles (59.3%) published in APS between 2012 and 2017 used one or more statistical method. Within these articles, there were 261 applications of statistical methods with continuous or ordinal outcomes, and 139 applications of statistical methods with categorical outcome. The Pearson chi-square test (17.4%) and the Mann-Whitney U test (14.4%) were the most frequently used methods. Errors in describing statistical methods and results were found in 133 of the 230 articles (57.8%). Inadequate description of P-values was the most common error (39.1%). Among the 230 articles that used statistical methods, 71.7% provided details about the statistical software programs used for the analyses. SPSS was predominantly used in the articles that presented statistical analyses. We found that the use of statistical methods in APS has increased over the last 6 years. It seems that researchers have been paying more attention to the proper use of statistics in recent years. It is expected that these positive trends will continue in APS.
Analysis of Parasite and Other Skewed Counts
Alexander, Neal
2012-01-01
Objective To review methods for the statistical analysis of parasite and other skewed count data. Methods Statistical methods for skewed count data are described and compared, with reference to those used over a ten year period of Tropical Medicine and International Health. Two parasitological datasets are used for illustration. Results Ninety papers were identified, 89 with descriptive and 60 with inferential analysis. A lack of clarity is noted in identifying measures of location, in particular the Williams and geometric mean. The different measures are compared, emphasizing the legitimacy of the arithmetic mean for skewed data. In the published papers, the t test and related methods were often used on untransformed data, which is likely to be invalid. Several approaches to inferential analysis are described, emphasizing 1) non-parametric methods, while noting that they are not simply comparisons of medians, and 2) generalized linear modelling, in particular with the negative binomial distribution. Additional methods, such as the bootstrap, with potential for greater use are described. Conclusions Clarity is recommended when describing transformations and measures of location. It is suggested that non-parametric methods and generalized linear models are likely to be sufficient for most analyses. PMID:22943299
Egbewale, Bolaji E; Lewis, Martyn; Sim, Julius
2014-04-09
Analysis of variance (ANOVA), change-score analysis (CSA) and analysis of covariance (ANCOVA) respond differently to baseline imbalance in randomized controlled trials. However, no empirical studies appear to have quantified the differential bias and precision of estimates derived from these methods of analysis, and their relative statistical power, in relation to combinations of levels of key trial characteristics. This simulation study therefore examined the relative bias, precision and statistical power of these three analyses using simulated trial data. 126 hypothetical trial scenarios were evaluated (126,000 datasets), each with continuous data simulated by using a combination of levels of: treatment effect; pretest-posttest correlation; direction and magnitude of baseline imbalance. The bias, precision and power of each method of analysis were calculated for each scenario. Compared to the unbiased estimates produced by ANCOVA, both ANOVA and CSA are subject to bias, in relation to pretest-posttest correlation and the direction of baseline imbalance. Additionally, ANOVA and CSA are less precise than ANCOVA, especially when pretest-posttest correlation ≥ 0.3. When groups are balanced at baseline, ANCOVA is at least as powerful as the other analyses. Apparently greater power of ANOVA and CSA at certain imbalances is achieved in respect of a biased treatment effect. Across a range of correlations between pre- and post-treatment scores and at varying levels and direction of baseline imbalance, ANCOVA remains the optimum statistical method for the analysis of continuous outcomes in RCTs, in terms of bias, precision and statistical power.
Investigation of vibration characteristics of electric motors
NASA Technical Reports Server (NTRS)
Bakshis, A. K.; Tamoshyunas, Y. K.
1973-01-01
The vibration characteristics of electric motors were analyzed using mathematical statistics methods. The equipment used and the method of conducting the test are described. Curves are developed to show the visualization of the electric motor vibrations in the vertical direction. Additional curves are included to show the amplitude-phase frequency characteristic of dynamic rotor-housing vibrations at the first lug and the same data for the second lug of the electric motor. Mathematical models were created to show the transmission function of the dynamic rotor housing system.
Applications of physical methods in high-frequency futures markets
NASA Astrophysics Data System (ADS)
Bartolozzi, M.; Mellen, C.; Chan, F.; Oliver, D.; Di Matteo, T.; Aste, T.
2007-12-01
In the present work we demonstrate the application of different physical methods to high-frequency or tick-bytick financial time series data. In particular, we calculate the Hurst exponent and inverse statistics for the price time series taken from a range of futures indices. Additionally, we show that in a limit order book the relaxation times of an imbalanced book state with more demand or supply can be described by stretched exponential laws analogous to those seen in many physical systems.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L
2014-10-15
Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Evidence-Based Medicine as a Tool for Undergraduate Probability and Statistics Education.
Masel, J; Humphrey, P T; Blackburn, B; Levine, J A
2015-01-01
Most students have difficulty reasoning about chance events, and misconceptions regarding probability can persist or even strengthen following traditional instruction. Many biostatistics classes sidestep this problem by prioritizing exploratory data analysis over probability. However, probability itself, in addition to statistics, is essential both to the biology curriculum and to informed decision making in daily life. One area in which probability is particularly important is medicine. Given the preponderance of pre health students, in addition to more general interest in medicine, we capitalized on students' intrinsic motivation in this area to teach both probability and statistics. We use the randomized controlled trial as the centerpiece of the course, because it exemplifies the most salient features of the scientific method, and the application of critical thinking to medicine. The other two pillars of the course are biomedical applications of Bayes' theorem and science and society content. Backward design from these three overarching aims was used to select appropriate probability and statistics content, with a focus on eliciting and countering previously documented misconceptions in their medical context. Pretest/posttest assessments using the Quantitative Reasoning Quotient and Attitudes Toward Statistics instruments are positive, bucking several negative trends previously reported in statistics education. © 2015 J. Masel et al. CBE—Life Sciences Education © 2015 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
NASA Astrophysics Data System (ADS)
Hinckley, Sarah; Parada, Carolina; Horne, John K.; Mazur, Michael; Woillez, Mathieu
2016-10-01
Biophysical individual-based models (IBMs) have been used to study aspects of early life history of marine fishes such as recruitment, connectivity of spawning and nursery areas, and marine reserve design. However, there is no consistent approach to validating the spatial outputs of these models. In this study, we hope to rectify this gap. We document additions to an existing individual-based biophysical model for Alaska walleye pollock (Gadus chalcogrammus), some simulations made with this model and methods that were used to describe and compare spatial output of the model versus field data derived from ichthyoplankton surveys in the Gulf of Alaska. We used visual methods (e.g. distributional centroids with directional ellipses), several indices (such as a Normalized Difference Index (NDI), and an Overlap Coefficient (OC), and several statistical methods: the Syrjala method, the Getis-Ord Gi* statistic, and a geostatistical method for comparing spatial indices. We assess the utility of these different methods in analyzing spatial output and comparing model output to data, and give recommendations for their appropriate use. Visual methods are useful for initial comparisons of model and data distributions. Metrics such as the NDI and OC give useful measures of co-location and overlap, but care must be taken in discretizing the fields into bins. The Getis-Ord Gi* statistic is useful to determine the patchiness of the fields. The Syrjala method is an easily implemented statistical measure of the difference between the fields, but does not give information on the details of the distributions. Finally, the geostatistical comparison of spatial indices gives good information of details of the distributions and whether they differ significantly between the model and the data. We conclude that each technique gives quite different information about the model-data distribution comparison, and that some are easy to apply and some more complex. We also give recommendations for a multistep process to validate spatial output from IBMs.
Kim, Sung-Min; Choi, Yosoon
2017-01-01
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168
NASA Astrophysics Data System (ADS)
Bierstedt, Svenja E.; Hünicke, Birgit; Zorita, Eduardo; Ludwig, Juliane
2017-07-01
We statistically analyse the relationship between the structure of migrating dunes in the southern Baltic and the driving wind conditions over the past 26 years, with the long-term aim of using migrating dunes as a proxy for past wind conditions at an interannual resolution. The present analysis is based on the dune record derived from geo-radar measurements by Ludwig et al. (2017). The dune system is located at the Baltic Sea coast of Poland and is migrating from west to east along the coast. The dunes present layers with different thicknesses that can be assigned to absolute dates at interannual timescales and put in relation to seasonal wind conditions. To statistically analyse this record and calibrate it as a wind proxy, we used a gridded regional meteorological reanalysis data set (coastDat2) covering recent decades. The identified link between the dune annual layers and wind conditions was additionally supported by the co-variability between dune layers and observed sea level variations in the southern Baltic Sea. We include precipitation and temperature into our analysis, in addition to wind, to learn more about the dependency between these three atmospheric factors and their common influence on the dune system. We set up a statistical linear model based on the correlation between the frequency of days with specific wind conditions in a given season and dune migration velocities derived for that season. To some extent, the dune records can be seen as analogous to tree-ring width records, and hence we use a proxy validation method usually applied in dendrochronology, cross-validation with the leave-one-out method, when the observational record is short. The revealed correlations between the wind record from the reanalysis and the wind record derived from the dune structure is in the range between 0.28 and 0.63, yielding similar statistical validation skill as dendroclimatological records.
Kim, Sung-Min; Choi, Yosoon
2017-06-18
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
Statistical methods used in articles published by the Journal of Periodontal and Implant Science.
Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young
2014-12-01
The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.
Bootstrap data methodology for sequential hybrid model building
NASA Technical Reports Server (NTRS)
Volponi, Allan J. (Inventor); Brotherton, Thomas (Inventor)
2007-01-01
A method for modeling engine operation comprising the steps of: 1. collecting a first plurality of sensory data, 2. partitioning a flight envelope into a plurality of sub-regions, 3. assigning the first plurality of sensory data into the plurality of sub-regions, 4. generating an empirical model of at least one of the plurality of sub-regions, 5. generating a statistical summary model for at least one of the plurality of sub-regions, 6. collecting an additional plurality of sensory data, 7. partitioning the second plurality of sensory data into the plurality of sub-regions, 8. generating a plurality of pseudo-data using the empirical model, and 9. concatenating the plurality of pseudo-data and the additional plurality of sensory data to generate an updated empirical model and an updated statistical summary model for at least one of the plurality of sub-regions.
Takeyoshi, Masahiro; Sawaki, Masakuni; Yamasaki, Kanji; Kimber, Ian
2003-09-30
The murine local lymph node assay (LLNA) is used for the identification of chemicals that have the potential to cause skin sensitization. However, it requires specific facility and handling procedures to accommodate a radioisotopic (RI) endpoint. We have developed non-radioisotopic (non-RI) endpoint of LLNA based on BrdU incorporation to avoid a use of RI. Although this alternative method appears viable in principle, it is somewhat less sensitive than the standard assay. In this study, we report investigations to determine the use of statistical analysis to improve the sensitivity of a non-RI LLNA procedure with alpha-hexylcinnamic aldehyde (HCA) in two separate experiments. Consequently, the alternative non-RI method required HCA concentrations of greater than 25% to elicit a positive response based on the criterion for classification as a skin sensitizer in the standard LLNA. Nevertheless, dose responses to HCA in the alternative method were consistent in both experiments and we examined whether the use of an endpoint based upon the statistical significance of induced changes in LNC turnover, rather than an SI of 3 or greater, might provide for additional sensitivity. The results reported here demonstrate that with HCA at least significant responses were, in each of two experiments, recorded following exposure of mice to 25% of HCA. These data suggest that this approach may be more satisfactory-at least when BrdU incorporation is measured. However, this modification of the LLNA is rather less sensitive than the standard method if employing statistical endpoint. Taken together the data reported here suggest that a modified LLNA in which BrdU is used in place of radioisotope incorporation shows some promise, but that in its present form, even with the use of a statistical endpoint, lacks some of the sensitivity of the standard method. The challenge is to develop strategies for further refinement of this approach.
Building the Community Online Resource for Statistical Seismicity Analysis (CORSSA)
NASA Astrophysics Data System (ADS)
Michael, A. J.; Wiemer, S.; Zechar, J. D.; Hardebeck, J. L.; Naylor, M.; Zhuang, J.; Steacy, S.; Corssa Executive Committee
2010-12-01
Statistical seismology is critical to the understanding of seismicity, the testing of proposed earthquake prediction and forecasting methods, and the assessment of seismic hazard. Unfortunately, despite its importance to seismology - especially to those aspects with great impact on public policy - statistical seismology is mostly ignored in the education of seismologists, and there is no central repository for the existing open-source software tools. To remedy these deficiencies, and with the broader goal to enhance the quality of statistical seismology research, we have begun building the Community Online Resource for Statistical Seismicity Analysis (CORSSA). CORSSA is a web-based educational platform that is authoritative, up-to-date, prominent, and user-friendly. We anticipate that the users of CORSSA will range from beginning graduate students to experienced researchers. More than 20 scientists from around the world met for a week in Zurich in May 2010 to kick-start the creation of CORSSA: the format and initial table of contents were defined; a governing structure was organized; and workshop participants began drafting articles. CORSSA materials are organized with respect to six themes, each containing between four and eight articles. The CORSSA web page, www.corssa.org, officially unveiled on September 6, 2010, debuts with an initial set of approximately 10 to 15 articles available online for viewing and commenting with additional articles to be added over the coming months. Each article will be peer-reviewed and will present a balanced discussion, including illustrative examples and code snippets. Topics in the initial set of articles will include: introductions to both CORSSA and statistical seismology, basic statistical tests and their role in seismology; understanding seismicity catalogs and their problems; basic techniques for modeling seismicity; and methods for testing earthquake predictability hypotheses. A special article will compare and review available statistical seismology software packages.
The intervals method: a new approach to analyse finite element outputs using multivariate statistics
De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep
2017-01-01
Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107
A feature refinement approach for statistical interior CT reconstruction
NASA Astrophysics Data System (ADS)
Hu, Zhanli; Zhang, Yunwan; Liu, Jianbo; Ma, Jianhua; Zheng, Hairong; Liang, Dong
2016-07-01
Interior tomography is clinically desired to reduce the radiation dose rendered to patients. In this work, a new statistical interior tomography approach for computed tomography is proposed. The developed design focuses on taking into account the statistical nature of local projection data and recovering fine structures which are lost in the conventional total-variation (TV)—minimization reconstruction. The proposed method falls within the compressed sensing framework of TV minimization, which only assumes that the interior ROI is piecewise constant or polynomial and does not need any additional prior knowledge. To integrate the statistical distribution property of projection data, the objective function is built under the criteria of penalized weighed least-square (PWLS-TV). In the implementation of the proposed method, the interior projection extrapolation based FBP reconstruction is first used as the initial guess to mitigate truncation artifacts and also provide an extended field-of-view. Moreover, an interior feature refinement step, as an important processing operation is performed after each iteration of PWLS-TV to recover the desired structure information which is lost during the TV minimization. Here, a feature descriptor is specifically designed and employed to distinguish structure from noise and noise-like artifacts. A modified steepest descent algorithm is adopted to minimize the associated objective function. The proposed method is applied to both digital phantom and in vivo Micro-CT datasets, and compared to FBP, ART-TV and PWLS-TV. The reconstruction results demonstrate that the proposed method performs better than other conventional methods in suppressing noise, reducing truncated and streak artifacts, and preserving features. The proposed approach demonstrates its potential usefulness for feature preservation of interior tomography under truncated projection measurements.
A feature refinement approach for statistical interior CT reconstruction.
Hu, Zhanli; Zhang, Yunwan; Liu, Jianbo; Ma, Jianhua; Zheng, Hairong; Liang, Dong
2016-07-21
Interior tomography is clinically desired to reduce the radiation dose rendered to patients. In this work, a new statistical interior tomography approach for computed tomography is proposed. The developed design focuses on taking into account the statistical nature of local projection data and recovering fine structures which are lost in the conventional total-variation (TV)-minimization reconstruction. The proposed method falls within the compressed sensing framework of TV minimization, which only assumes that the interior ROI is piecewise constant or polynomial and does not need any additional prior knowledge. To integrate the statistical distribution property of projection data, the objective function is built under the criteria of penalized weighed least-square (PWLS-TV). In the implementation of the proposed method, the interior projection extrapolation based FBP reconstruction is first used as the initial guess to mitigate truncation artifacts and also provide an extended field-of-view. Moreover, an interior feature refinement step, as an important processing operation is performed after each iteration of PWLS-TV to recover the desired structure information which is lost during the TV minimization. Here, a feature descriptor is specifically designed and employed to distinguish structure from noise and noise-like artifacts. A modified steepest descent algorithm is adopted to minimize the associated objective function. The proposed method is applied to both digital phantom and in vivo Micro-CT datasets, and compared to FBP, ART-TV and PWLS-TV. The reconstruction results demonstrate that the proposed method performs better than other conventional methods in suppressing noise, reducing truncated and streak artifacts, and preserving features. The proposed approach demonstrates its potential usefulness for feature preservation of interior tomography under truncated projection measurements.
Mutz, Rüdiger; Daniel, Hans-Dieter
2013-06-01
It is often claimed that psychology students' attitudes towards research methods and statistics affect course enrollment, persistence, achievement, and course climate. However, the inter-institutional variability has been widely neglected in the research on students' attitudes towards research methods and statistics, but it is important for didactic purposes (heterogeneity of the student population). The paper presents a scale based on findings of the social psychology of attitudes (polar and emotion-based concept) in conjunction with a method for capturing beginning university students' attitudes towards research methods and statistics and identifying the proportion of students having positive attitudes at the institutional level. The study based on a re-analysis of a nationwide survey in Germany in August 2000 of all psychology students that enrolled in fall 1999/2000 (N= 1,490) and N= 44 universities. Using multilevel latent-class analysis (MLLCA), the aim was to group students in different student attitude types and at the same time to obtain university segments based on the incidences of the different student attitude types. Four student latent clusters were found that can be ranked on a bipolar attitude dimension. Membership in a cluster was predicted by age, grade point average (GPA) on school-leaving exam, and personality traits. In addition, two university segments were found: universities with an average proportion of students with positive attitudes and universities with a high proportion of students with positive attitudes (excellent segment). As psychology students make up a very heterogeneous group, the use of multiple learning activities as opposed to the classical lecture course is required. © 2011 The British Psychological Society.
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Heart Rate Variability Dynamics for the Prognosis of Cardiovascular Risk
Ramirez-Villegas, Juan F.; Lam-Espinosa, Eric; Ramirez-Moreno, David F.; Calvo-Echeverry, Paulo C.; Agredo-Rodriguez, Wilfredo
2011-01-01
Statistical, spectral, multi-resolution and non-linear methods were applied to heart rate variability (HRV) series linked with classification schemes for the prognosis of cardiovascular risk. A total of 90 HRV records were analyzed: 45 from healthy subjects and 45 from cardiovascular risk patients. A total of 52 features from all the analysis methods were evaluated using standard two-sample Kolmogorov-Smirnov test (KS-test). The results of the statistical procedure provided input to multi-layer perceptron (MLP) neural networks, radial basis function (RBF) neural networks and support vector machines (SVM) for data classification. These schemes showed high performances with both training and test sets and many combinations of features (with a maximum accuracy of 96.67%). Additionally, there was a strong consideration for breathing frequency as a relevant feature in the HRV analysis. PMID:21386966
Potential capabilities for compression of information of certain data processing systems
NASA Technical Reports Server (NTRS)
Khodarev, Y. K.; Yevdokimov, V. P.; Pokras, V. M.
1974-01-01
This article undertakes to study a generalized block diagram of a data collection and processing system of a spacecraft in which a number of sensors or outputs of scientific instruments are cyclically interrogated by a commutator, methods of writing the supplementary information in a frame on the example of a certain hypothetical telemetry system, and the influence of statistics of number of active channels in a frame on frame compression factor. The separation of the data compression factor of the collection and processing system of spacecraft into two parts used in this work allows determination of the compression factor of an active frame depending not only on the statistics of activity of channels in the telemetry frame, but also on the method of introduction of the additional address and time information to each frame.
An adaptive multi-feature segmentation model for infrared image
NASA Astrophysics Data System (ADS)
Zhang, Tingting; Han, Jin; Zhang, Yi; Bai, Lianfa
2016-04-01
Active contour models (ACM) have been extensively applied to image segmentation, conventional region-based active contour models only utilize global or local single feature information to minimize the energy functional to drive the contour evolution. Considering the limitations of original ACMs, an adaptive multi-feature segmentation model is proposed to handle infrared images with blurred boundaries and low contrast. In the proposed model, several essential local statistic features are introduced to construct a multi-feature signed pressure function (MFSPF). In addition, we draw upon the adaptive weight coefficient to modify the level set formulation, which is formed by integrating MFSPF with local statistic features and signed pressure function with global information. Experimental results demonstrate that the proposed method can make up for the inadequacy of the original method and get desirable results in segmenting infrared images.
NASA Astrophysics Data System (ADS)
Mead, J.; Wright, G. B.
2013-12-01
The collection of massive amounts of high quality data from new and greatly improved observing technologies and from large-scale numerical simulations are drastically improving our understanding and modeling of the earth system. However, these datasets are also revealing important knowledge gaps and limitations of our current conceptual models for explaining key aspects of these new observations. These limitations are impeding progress on questions that have both fundamental scientific and societal significance, including climate and weather, natural disaster mitigation, earthquake and volcano dynamics, earth structure and geodynamics, resource exploration, and planetary evolution. New conceptual approaches and numerical methods for characterizing and simulating these systems are needed - methods that can handle processes which vary through a myriad of scales in heterogeneous, complex environments. Additionally, as certain aspects of these systems may be observable only indirectly or not at all, new statistical methods are also needed. This type of research will demand integrating the expertise of geoscientist together with that of mathematicians, statisticians, and computer scientists. If the past is any indicator, this interdisciplinary research will no doubt lead to advances in all these fields in addition to vital improvements in our ability to predict the behavior of the planetary environment. The Consortium for Mathematics in the Geosciences (CMG++) arose from two scientific workshops held at Northwestern and Princeton in 2011 and 2012 with participants from mathematics, statistics, geoscience and computational science. The mission of CMG++ is to accelerate the traditional interaction between people in these disciplines through the promotion of both collaborative research and interdisciplinary education. We will discuss current activities, describe how people can get involved, and solicit input from the broader AGU community.
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
NASA Astrophysics Data System (ADS)
Karali, Anna; Giannakopoulos, Christos; Frias, Maria Dolores; Hatzaki, Maria; Roussos, Anargyros; Casanueva, Ana
2013-04-01
Forest fires have always been present in the Mediterranean ecosystems, thus they constitute a major ecological and socio-economic issue. The last few decades though, the number of forest fires has significantly increased, as well as their severity and impact on the environment. Local fire danger projections are often required when dealing with wild fire research. In the present study the application of statistical downscaling and spatial interpolation methods was performed to the Canadian Fire Weather Index (FWI), in order to assess forest fire risk in Greece. The FWI is used worldwide (including the Mediterranean basin) to estimate the fire danger in a generalized fuel type, based solely on weather observations. The meteorological inputs to the FWI System are noon values of dry-bulb temperature, air relative humidity, 10m wind speed and precipitation during the previous 24 hours. The statistical downscaling methods are based on a statistical model that takes into account empirical relationships between large scale variables (used as predictors) and local scale variables. In the framework of the current study the statistical downscaling portal developed by the Santander Meteorology Group (https://www.meteo.unican.es/downscaling) in the framework of the EU project CLIMRUN (www.climrun.eu) was used to downscale non standard parameters related to forest fire risk. In this study, two different approaches were adopted. Firstly, the analogue downscaling technique was directly performed to the FWI index values and secondly the same downscaling technique was performed indirectly through the meteorological inputs of the index. In both cases, the statistical downscaling portal was used considering the ERA-Interim reanalysis as predictands due to the lack of observations at noon. Additionally, a three-dimensional (3D) interpolation method of position and elevation, based on Thin Plate Splines (TPS) was used, to interpolate the ERA-Interim data used to calculate the index. Results from this method were compared with the statistical downscaling results obtained from the portal. Finally, FWI was computed using weather observations obtained from the Hellenic National Meteorological Service, mainly in the south continental part of Greece and a comparison with the previous results was performed.
Additives and salts for dye-sensitized solar cells electrolytes: what is the best choice?
NASA Astrophysics Data System (ADS)
Bella, Federico; Sacco, Adriano; Pugliese, Diego; Laurenti, Marco; Bianco, Stefano
2014-10-01
A multivariate chemometric approach is proposed for the first time for performance optimization of I-/I3- liquid electrolytes for dye-sensitized solar cells (DSSCs). Over the years the system composed by iodide/triiodide redox shuttle dissolved in organic solvent has been enriched with the addition of different specific cations and chemical compounds to improve the photoelectrochemical behavior of the cell. However, usually such additives act favorably with respect to some of the cell parameters and negatively to others. Moreover, the combined action of different compounds often yields contradictory results, and from the literature it is not possible to identify an optimal recipe. We report here a systematic work, based on a multivariate experimental design, to statistically and quantitatively evaluate the effect of different additives on the photovoltaic performances of the device. The effect of cation size in iodine salts, the iodine/iodide ratio in the electrolyte and the effect of type and concentration of additives are mutually evaluated by means of a Design of Experiment (DoE) approach. Through this statistical method, the optimization of the overall parameters is demonstrated with a limited number of experimental trials. A 25% improvement on the photovoltaic conversion efficiency compared with that obtained with a commercial electrolyte is demonstrated.
NASA Astrophysics Data System (ADS)
Roccia, S.; Gaulard, C.; Étilé, A.; Chakma, R.
2017-07-01
In the context of nuclear orientation, we propose a new method to correct the multipole mixing ratios for asymmetries in the geometry of the setup but also in the detection system. This method is also robust against temperature fluctuations, beam intensity fluctuations and uncertainties in the nuclear structure of the nuclei. Additionally, this method provides a natural way to combine data from different detectors and make good use of all available statistics. We could use this method to demonstrate the accuracy that can be reached with the PolarEx setup now installed at the ALTO facility.
Dittmar, John C.; Pierce, Steven; Rothstein, Rodney; Reid, Robert J. D.
2013-01-01
Genome-wide experiments often measure quantitative differences between treated and untreated cells to identify affected strains. For these studies, statistical models are typically used to determine significance cutoffs. We developed a method termed “CLIK” (Cutoff Linked to Interaction Knowledge) that overlays biological knowledge from the interactome on screen results to derive a cutoff. The method takes advantage of the fact that groups of functionally related interacting genes often respond similarly to experimental conditions and, thus, cluster in a ranked list of screen results. We applied CLIK analysis to five screens of the yeast gene disruption library and found that it defined a significance cutoff that differed from traditional statistics. Importantly, verification experiments revealed that the CLIK cutoff correlated with the position in the rank order where the rate of true positives drops off significantly. In addition, the gene sets defined by CLIK analysis often provide further biological perspectives. For example, applying CLIK analysis retrospectively to a screen for cisplatin sensitivity allowed us to identify the importance of the Hrq1 helicase in DNA crosslink repair. Furthermore, we demonstrate the utility of CLIK to determine optimal treatment conditions by analyzing genome-wide screens at multiple rapamycin concentrations. We show that CLIK is an extremely useful tool for evaluating screen quality, determining screen cutoffs, and comparing results between screens. Furthermore, because CLIK uses previously annotated interaction data to determine biologically informed cutoffs, it provides additional insights into screen results, which supplement traditional statistical approaches. PMID:23589890
Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.
2011-01-01
GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647
NASA Astrophysics Data System (ADS)
McCune, Robert C.; Upadhyay, Vinod; Wang, Yar-Ming; Battocchi, Dante
The potential utility of AC-DC-AC electrochemical methods in comparative measures of corrosion-resisting coating system performance for magnesium alloys under consideration for the USAMP "Magnesium Front End Research and Development" project was previously shown in this forum [1]. Additional studies of this approach using statistically-designed experiments have been conducted with focus on alloy types, pretreatment, topcoat material and topcoat thickness as the variables. Additionally, sample coupons made for these designed experiments were also subjected to a typical automotive cyclic corrosion test cycle (SAE J2334) as well as ASTM B117 for comparison of relative performance. Results of these studies are presented along with advantages and limitations of the proposed methodology.
NASA Astrophysics Data System (ADS)
Sanchez, J.
2018-06-01
In this paper, the application and analysis of the asymptotic approximation method to a single degree-of-freedom has recently been produced. The original concepts are summarized, and the necessary probabilistic concepts are developed and applied to single degree-of-freedom systems. Then, these concepts are united, and the theoretical and computational models are developed. To determine the viability of the proposed method in a probabilistic context, numerical experiments are conducted, and consist of a frequency analysis, analysis of the effects of measurement noise, and a statistical analysis. In addition, two examples are presented and discussed.
Barry, Samantha J; Pham, Tran N; Borman, Phil J; Edwards, Andrew J; Watson, Simon A
2012-01-27
The DMAIC (Define, Measure, Analyse, Improve and Control) framework and associated statistical tools have been applied to both identify and reduce variability observed in a quantitative (19)F solid-state NMR (SSNMR) analytical method. The method had been developed to quantify levels of an additional polymorph (Form 3) in batches of an active pharmaceutical ingredient (API), where Form 1 is the predominant polymorph. In order to validate analyses of the polymorphic form, a single batch of API was used as a standard each time the method was used. The level of Form 3 in this standard was observed to gradually increase over time, the effect not being immediately apparent due to method variability. In order to determine the cause of this unexpected increase and to reduce method variability, a risk-based statistical investigation was performed to identify potential factors which could be responsible for these effects. Factors identified by the risk assessment were investigated using a series of designed experiments to gain a greater understanding of the method. The increase of the level of Form 3 in the standard was primarily found to correlate with the number of repeat analyses, an effect not previously reported in SSNMR literature. Differences in data processing (phasing and linewidth) were found to be responsible for the variability in the method. After implementing corrective actions the variability was reduced such that the level of Form 3 was within an acceptable range of ±1% ww(-1) in fresh samples of API. Copyright © 2011. Published by Elsevier B.V.
Multiple testing and power calculations in genetic association studies.
So, Hon-Cheong; Sham, Pak C
2011-01-01
Modern genetic association studies typically involve multiple single-nucleotide polymorphisms (SNPs) and/or multiple genes. With the development of high-throughput genotyping technologies and the reduction in genotyping cost, investigators can now assay up to a million SNPs for direct or indirect association with disease phenotypes. In addition, some studies involve multiple disease or related phenotypes and use multiple methods of statistical analysis. The combination of multiple genetic loci, multiple phenotypes, and multiple methods of evaluating associations between genotype and phenotype means that modern genetic studies often involve the testing of an enormous number of hypotheses. When multiple hypothesis tests are performed in a study, there is a risk of inflation of the type I error rate (i.e., the chance of falsely claiming an association when there is none). Several methods for multiple-testing correction are in popular use, and they all have strengths and weaknesses. Because no single method is universally adopted or always appropriate, it is important to understand the principles, strengths, and weaknesses of the methods so that they can be applied appropriately in practice. In this article, we review the three principle methods for multiple-testing correction and provide guidance for calculating statistical power.
Institute for Brain and Neural Systems
2009-10-06
to deal with computational complexity when analyzing large amounts of information in visual scenes. It seems natural that in addition to exploring...algorithms using methods from statistical pattern recognition and machine learning. Over the last fifteen years, significant advances had been made in...recognition, robustness to noise and ability to cope with significant variations in lighting conditions. Identifying an occluded target adds another layer of
NASA Technical Reports Server (NTRS)
Crabill, Norman L.
1989-01-01
Data obtained from the digital flight data recorder system of a L 1011 aircraft in 914 flights and 1619 hours of airline revenue operations are presented. Data on conditions with flap deployment and autopilot use are given. In addition, acceleration statistics are presented from 23 hours on nonrevenue flights.
Statistical, Graphical, and Learning Methods for Sensing, Surveillance, and Navigation Systems
2016-06-28
harsh propagation environments. Conventional filtering techniques fail to provide satisfactory performance in many important nonlinear or non...Gaussian scenarios. In addition, there is a lack of a unified methodology for the design and analysis of different filtering techniques. To address...these problems, we have proposed a new filtering methodology called belief condensation (BC) DISTRIBUTION A: Distribution approved for public release
ERIC Educational Resources Information Center
Shahbazzadeh, Somayeh; Beliad, Mohammad Reza
2017-01-01
This study investigates the mediatory role of exercise self-regulation role in the relationship between personality traits and anger management among athletes. The statistical population of this study includes all athlete students of Shar-e Ghods College, among which 260 people were selected as sample using random sampling method. In addition, the…
Shivakumarswamy, Udasimath; Arakeri, Surekha U; Karigowdar, Mahesh H; Yelikar, Br
2012-01-01
The cytological examinations of serous effusions have been well-accepted, and a positive diagnosis is often considered as a definitive diagnosis. It helps in staging, prognosis and management of the patients in malignancies and also gives information about various inflammatory and non-inflammatory lesions. Diagnostic problems arise in everyday practice to differentiate reactive atypical mesothelial cells and malignant cells by the routine conventional smear (CS) method. To compare the morphological features of the CS method with those of the cell block (CB) method and also to assess the utility and sensitivity of the CB method in the cytodiagnosis of pleural effusions. The study was conducted in the cytology section of the Department of Pathology. Sixty pleural fluid samples were subjected to diagnostic evaluation for over a period of 20 months. Along with the conventional smears, cell blocks were prepared by using 10% alcohol-formalin as a fixative agent. Statistical analysis with the 'z test' was performed to identify the cellularity, using the CS and CB methods. Mc. Naemer's χ(2)test was used to identify the additional yield for malignancy by the CB method. Cellularity and additional yield for malignancy was 15% more by the CB method. The CB method provides high cellularity, better architectural patterns, morphological features and an additional yield of malignant cells, and thereby, increases the sensitivity of the cytodiagnosis when compared with the CS method.
Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.
McCallum, Kenneth J; Ionita-Laza, Iuliana
2015-12-01
Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.
Vegas-Sanchez-Ferrero, G; Aja-Fernandez, S; Martin-Fernandez, M; Frangi, A F; Palencia, C
2010-01-01
A novel anisotropic diffusion filter is proposed in this work with application to cardiac ultrasonic images. It includes probabilistic models which describe the probability density function (PDF) of tissues and adapts the diffusion tensor to the image iteratively. For this purpose, a preliminary study is performed in order to select the probability models that best fit the stastitical behavior of each tissue class in cardiac ultrasonic images. Then, the parameters of the diffusion tensor are defined taking into account the statistical properties of the image at each voxel. When the structure tensor of the probability of belonging to each tissue is included in the diffusion tensor definition, a better boundaries estimates can be obtained instead of calculating directly the boundaries from the image. This is the main contribution of this work. Additionally, the proposed method follows the statistical properties of the image in each iteration. This is considered as a second contribution since state-of-the-art methods suppose that noise or statistical properties of the image do not change during the filter process.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Numerically exact full counting statistics of the nonequilibrium Anderson impurity model
NASA Astrophysics Data System (ADS)
Ridley, Michael; Singh, Viveka N.; Gull, Emanuel; Cohen, Guy
2018-03-01
The time-dependent full counting statistics of charge transport through an interacting quantum junction is evaluated from its generating function, controllably computed with the inchworm Monte Carlo method. Exact noninteracting results are reproduced; then, we continue to explore the effect of electron-electron interactions on the time-dependent charge cumulants, first-passage time distributions, and n -electron transfer distributions. We observe a crossover in the noise from Coulomb blockade to Kondo-dominated physics as the temperature is decreased. In addition, we uncover long-tailed spin distributions in the Kondo regime and analyze queuing behavior caused by correlations between single-electron transfer events.
A Ground Flash Fraction Retrieval Algorithm for GLM
NASA Technical Reports Server (NTRS)
Koshak, William J.
2010-01-01
A Bayesian inversion method is introduced for retrieving the fraction of ground flashes in a set of N lightning observed by a satellite lightning imager (such as the Geostationary Lightning Mapper, GLM). An exponential model is applied as a physically reasonable constraint to describe the measured lightning optical parameter distributions. Population statistics (i.e., the mean and variance) are invoked to add additional constraints to the retrieval process. The Maximum A Posteriori (MAP) solution is employed. The approach is tested by performing simulated retrievals, and retrieval error statistics are provided. The approach is feasible for N greater than 2000, and retrieval errors decrease as N is increased.
Numerically exact full counting statistics of the nonequilibrium Anderson impurity model
Ridley, Michael; Singh, Viveka N.; Gull, Emanuel; ...
2018-03-06
The time-dependent full counting statistics of charge transport through an interacting quantum junction is evaluated from its generating function, controllably computed with the inchworm Monte Carlo method. Exact noninteracting results are reproduced; then, we continue to explore the effect of electron-electron interactions on the time-dependent charge cumulants, first-passage time distributions, and n-electron transfer distributions. We observe a crossover in the noise from Coulomb blockade to Kondo-dominated physics as the temperature is decreased. In addition, we uncover long-tailed spin distributions in the Kondo regime and analyze queuing behavior caused by correlations between single-electron transfer events
Abboud, R; Issa, H; Abed-Allah, Y D; Bakraji, E H
2015-11-01
Statistical analysis based on chemical composition, using radioisotope X-ray fluorescence, have been applied on 39 ancient pottery fragments coming from the excavation at Tell Al-Kasra archaeological site, Syria. Three groups were defined by applying Cluster and Factor analysis statistical methods. Thermoluminescence (TL) dating was investigated on three sherds taken from the bathroom (hammam) on the site. Multiple aliquot additive dose (MAAD) was used to estimate the paleodose value, and the gamma spectrometry was used to estimate the dose rate. The average age was found to be 715±36 year. Copyright © 2015 Elsevier Ltd. All rights reserved.
Numerically exact full counting statistics of the nonequilibrium Anderson impurity model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ridley, Michael; Singh, Viveka N.; Gull, Emanuel
The time-dependent full counting statistics of charge transport through an interacting quantum junction is evaluated from its generating function, controllably computed with the inchworm Monte Carlo method. Exact noninteracting results are reproduced; then, we continue to explore the effect of electron-electron interactions on the time-dependent charge cumulants, first-passage time distributions, and n-electron transfer distributions. We observe a crossover in the noise from Coulomb blockade to Kondo-dominated physics as the temperature is decreased. In addition, we uncover long-tailed spin distributions in the Kondo regime and analyze queuing behavior caused by correlations between single-electron transfer events
Using data warehousing and OLAP in public health care.
Hristovski, D; Rogac, M; Markota, M
2000-01-01
The paper describes the possibilities of using data warehousing and OLAP technologies in public health care in general and then our own experience with these technologies gained during the implementation of a data warehouse of outpatient data at the national level. Such a data warehouse serves as a basis for advanced decision support systems based on statistical, OLAP or data mining methods. We used OLAP to enable interactive exploration and analysis of the data. We found out that data warehousing and OLAP are suitable for the domain of public health and that they enable new analytical possibilities in addition to the traditional statistical approaches.
Using data warehousing and OLAP in public health care.
Hristovski, D.; Rogac, M.; Markota, M.
2000-01-01
The paper describes the possibilities of using data warehousing and OLAP technologies in public health care in general and then our own experience with these technologies gained during the implementation of a data warehouse of outpatient data at the national level. Such a data warehouse serves as a basis for advanced decision support systems based on statistical, OLAP or data mining methods. We used OLAP to enable interactive exploration and analysis of the data. We found out that data warehousing and OLAP are suitable for the domain of public health and that they enable new analytical possibilities in addition to the traditional statistical approaches. PMID:11079907
Methodological reporting of randomized trials in five leading Chinese nursing journals.
Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu
2014-01-01
Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.
Permutation-based inference for the AUC: A unified approach for continuous and discontinuous data.
Pauly, Markus; Asendorf, Thomas; Konietschke, Frank
2016-11-01
We investigate rank-based studentized permutation methods for the nonparametric Behrens-Fisher problem, that is, inference methods for the area under the ROC curve. We hereby prove that the studentized permutation distribution of the Brunner-Munzel rank statistic is asymptotically standard normal, even under the alternative. Thus, incidentally providing the hitherto missing theoretical foundation for the Neubert and Brunner studentized permutation test. In particular, we do not only show its consistency, but also that confidence intervals for the underlying treatment effects can be computed by inverting this permutation test. In addition, we derive permutation-based range-preserving confidence intervals. Extensive simulation studies show that the permutation-based confidence intervals appear to maintain the preassigned coverage probability quite accurately (even for rather small sample sizes). For a convenient application of the proposed methods, a freely available software package for the statistical software R has been developed. A real data example illustrates the application. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ma, Yan; Zhang, Wei; Lyman, Stephen; Huang, Yihe
2018-06-01
To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. HCUP SID. A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conditional MI substantially improved statistical inferences for racial health disparities research with the SID. © Health Research and Educational Trust.
2011-01-01
Background Monitoring the time course of mortality by cause is a key public health issue. However, several mortality data production changes may affect cause-specific time trends, thus altering the interpretation. This paper proposes a statistical method that detects abrupt changes ("jumps") and estimates correction factors that may be used for further analysis. Methods The method was applied to a subset of the AMIEHS (Avoidable Mortality in the European Union, toward better Indicators for the Effectiveness of Health Systems) project mortality database and considered for six European countries and 13 selected causes of deaths. For each country and cause of death, an automated jump detection method called Polydect was applied to the log mortality rate time series. The plausibility of a data production change associated with each detected jump was evaluated through literature search or feedback obtained from the national data producers. For each plausible jump position, the statistical significance of the between-age and between-gender jump amplitude heterogeneity was evaluated by means of a generalized additive regression model, and correction factors were deduced from the results. Results Forty-nine jumps were detected by the Polydect method from 1970 to 2005. Most of the detected jumps were found to be plausible. The age- and gender-specific amplitudes of the jumps were estimated when they were statistically heterogeneous, and they showed greater by-age heterogeneity than by-gender heterogeneity. Conclusion The method presented in this paper was successfully applied to a large set of causes of death and countries. The method appears to be an alternative to bridge coding methods when the latter are not systematically implemented because they are time- and resource-consuming. PMID:21929756
[Suicide due to mental diseases based on the Vital Statistics Survey Death Form].
Takizawa, Tohru
2012-06-01
Mental diseases such as schizophrenia and depression put patients at risk for suicide. It is extremely important to understand that one way of preventing suicide is to determine the actual mental state of the individual. The purpose of this study was to analyze the true mental state of suicide victims reported in the vital statistics. This study investigated the vital statistics of 30,299 suicide victims in Japan in 2008. The use of these basic statistics for non-statistical purposes was approved by the Japanese Ministry of Health, Labour and Welfare. The method involved reviewing the Vital Statistics Survey Death Form at the Ministry of Health, Labour and Welfare as well as analyzing their Online Reporting of Vital Statistics. Furthermore, this study was able to validate 29,799 of the 30,299 suicides (98.3%) that occurred in 2008. Mental diseases were validated not only from the "Cause of death" section as marked on the death certificate, but also by information found in sections for "Additional items for death by external cause" and "Other special remarks." RESULTS; From the Vital Statistics Survey Death Form and Online Reporting of Vital Statistics, 2964 individuals with either a mental disease or mental disorder were identified. Of the 2964 identified individuals, 55 had dementia (of which 13 were dementia in Alzheimer's disease), 116 had alcohol dependence/psychotic disorder, 550 had schizophrenia, 101 had bipolar affective disorder, 1,913 has had a depressive episode, 13 had obsessive-compulsive disorder, 22 had adjustment disorders, 14 had eating disorders, 49 had nonorganic sleep disorders, 24 had personality disorder, and 6 had pervasive developmental disorders. In addition, 125 individuals had more than one mental disease. The national police statistics from 2008 show that 1,368 suicide victims had schizophrenia and 6,490 had depression. These figures show quite a difference between the results of this study and the police statistics. Further, there have been controversies regarding autopsies of suicide victims. Thus, further investigation into the cause of death is of great importance.
A comparison of data-driven groundwater vulnerability assessment methods
Sorichetta, Alessandro; Ballabio, Cristiano; Masetti, Marco; Robinson, Gilpin R.; Sterlacchini, Simone
2013-01-01
Increasing availability of geo-environmental data has promoted the use of statistical methods to assess groundwater vulnerability. Nitrate is a widespread anthropogenic contaminant in groundwater and its occurrence can be used to identify aquifer settings vulnerable to contamination. In this study, multivariate Weights of Evidence (WofE) and Logistic Regression (LR) methods, where the response variable is binary, were used to evaluate the role and importance of a number of explanatory variables associated with nitrate sources and occurrence in groundwater in the Milan District (central part of the Po Plain, Italy). The results of these models have been used to map the spatial variation of groundwater vulnerability to nitrate in the region, and we compare the similarities and differences of their spatial patterns and associated explanatory variables. We modify the standard WofE method used in previous groundwater vulnerability studies to a form analogous to that used in LR; this provides a framework to compare the results of both models and reduces the effect of sampling bias on the results of the standard WofE model. In addition, a nonlinear Generalized Additive Model has been used to extend the LR analysis. Both approaches improved discrimination of the standard WofE and LR models, as measured by the c-statistic. Groundwater vulnerability probability outputs, based on rank-order classification of the respective model results, were similar in spatial patterns and identified similar strong explanatory variables associated with nitrate source (population density as a proxy for sewage systems and septic sources) and nitrate occurrence (groundwater depth).
Muselík, Jan; Franc, Aleš; Doležel, Petr; Goněc, Roman; Krondlová, Anna; Lukášová, Ivana
2014-09-01
The article describes the development and production of tablets using direct compression of powder mixtures. The aim was to describe the impact of filler particle size and the time of lubricant addition during mixing on content uniformity according to the Good Manufacturing Practice (GMP) process validation requirements. Processes are regulated by complex directives, forcing the producers to validate, using sophisticated methods, the content uniformity of intermediates as well as final products. Cutting down of production time and material, shortening of analyses, and fast and reliable statistic evaluation of results can reduce the final price without affecting product quality. The manufacturing process of directly compressed tablets containing the low dose active pharmaceutical ingredient (API) warfarin, with content uniformity passing validation criteria, is used as a model example. Statistic methods have proved that the manufacturing process is reproducible. Methods suitable for elucidation of various properties of the final blend, e.g., measurement of electrostatic charge by Faraday pail and evaluation of mutual influences of researched variables by partial least square (PLS) regression, were used. Using these methods, it was proved that the filler with higher particle size increased the content uniformity of both blends and the ensuing tablets. Addition of the lubricant, magnesium stearate, during the blending process improved the content uniformity of blends containing the filler with larger particles. This seems to be caused by reduced sampling error due to the suppression of electrostatic charge.
Meta-analysis and The Cochrane Collaboration: 20 years of the Cochrane Statistical Methods Group
2013-01-01
The Statistical Methods Group has played a pivotal role in The Cochrane Collaboration over the past 20 years. The Statistical Methods Group has determined the direction of statistical methods used within Cochrane reviews, developed guidance for these methods, provided training, and continued to discuss and consider new and controversial issues in meta-analysis. The contribution of Statistical Methods Group members to the meta-analysis literature has been extensive and has helped to shape the wider meta-analysis landscape. In this paper, marking the 20th anniversary of The Cochrane Collaboration, we reflect on the history of the Statistical Methods Group, beginning in 1993 with the identification of aspects of statistical synthesis for which consensus was lacking about the best approach. We highlight some landmark methodological developments that Statistical Methods Group members have contributed to in the field of meta-analysis. We discuss how the Group implements and disseminates statistical methods within The Cochrane Collaboration. Finally, we consider the importance of robust statistical methodology for Cochrane systematic reviews, note research gaps, and reflect on the challenges that the Statistical Methods Group faces in its future direction. PMID:24280020
Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application
Cantor, Rita M.; Lange, Kenneth; Sinsheimer, Janet S.
2010-01-01
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach. PMID:20074509
Petukh, Marharyta; Li, Minghui; Alexov, Emil
2015-07-01
A new methodology termed Single Amino Acid Mutation based change in Binding free Energy (SAAMBE) was developed to predict the changes of the binding free energy caused by mutations. The method utilizes 3D structures of the corresponding protein-protein complexes and takes advantage of both approaches: sequence- and structure-based methods. The method has two components: a MM/PBSA-based component, and an additional set of statistical terms delivered from statistical investigation of physico-chemical properties of protein complexes. While the approach is rigid body approach and does not explicitly consider plausible conformational changes caused by the binding, the effect of conformational changes, including changes away from binding interface, on electrostatics are mimicked with amino acid specific dielectric constants. This provides significant improvement of SAAMBE predictions as indicated by better match against experimentally determined binding free energy changes over 1300 mutations in 43 proteins. The final benchmarking resulted in a very good agreement with experimental data (correlation coefficient 0.624) while the algorithm being fast enough to allow for large-scale calculations (the average time is less than a minute per mutation).
Shulman, Stanley A; Smith, Jerome P
2002-01-01
A method is presented for the evaluation of the bias, variability, and accuracy of gas monitors. This method is based on using the parameters for the fitted response curves of the monitors. Thereby, variability between calibrations, between dates within each calibration period, and between different units can be evaluated at several different standard concentrations. By combining variability information with bias information, accuracy can be assessed. An example using carbon monoxide monitor data is provided. Although the most general statistical software required for these tasks is not available on a spreadsheet, when the same number of dates in a calibration period are evaluated for each monitor unit, the calculations can be done on a spreadsheet. An example of such calculations, together with the formulas needed for their implementation, is provided. In addition, the methods can be extended by use of appropriate statistical models and software to evaluate monitor trends within calibration periods, as well as consider the effects of other variables, such as humidity and temperature, on monitor variability and bias.
The heritability of the functional connectome is robust to common nonlinear registration methods
NASA Astrophysics Data System (ADS)
Hafzalla, George W.; Prasad, Gautam; Baboyan, Vatche G.; Faskowitz, Joshua; Jahanshad, Neda; McMahon, Katie L.; de Zubicaray, Greig I.; Wright, Margaret J.; Braskie, Meredith N.; Thompson, Paul M.
2016-03-01
Nonlinear registration algorithms are routinely used in brain imaging, to align data for inter-subject and group comparisons, and for voxelwise statistical analyses. To understand how the choice of registration method affects maps of functional brain connectivity in a sample of 611 twins, we evaluated three popular nonlinear registration methods: Advanced Normalization Tools (ANTs), Automatic Registration Toolbox (ART), and FMRIB's Nonlinear Image Registration Tool (FNIRT). Using both structural and functional MRI, we used each of the three methods to align the MNI152 brain template, and 80 regions of interest (ROIs), to each subject's T1-weighted (T1w) anatomical image. We then transformed each subject's ROIs onto the associated resting state functional MRI (rs-fMRI) scans and computed a connectivity network or functional connectome for each subject. Given the different degrees of genetic similarity between pairs of monozygotic (MZ) and same-sex dizygotic (DZ) twins, we used structural equation modeling to estimate the additive genetic influences on the elements of the function networks, or their heritability. The functional connectome and derived statistics were relatively robust to nonlinear registration effects.
Numerical solutions of the semiclassical Boltzmann ellipsoidal-statistical kinetic model equation
Yang, Jaw-Yen; Yan, Chin-Yuan; Huang, Juan-Chen; Li, Zhihui
2014-01-01
Computations of rarefied gas dynamical flows governed by the semiclassical Boltzmann ellipsoidal-statistical (ES) kinetic model equation using an accurate numerical method are presented. The semiclassical ES model was derived through the maximum entropy principle and conserves not only the mass, momentum and energy, but also contains additional higher order moments that differ from the standard quantum distributions. A different decoding procedure to obtain the necessary parameters for determining the ES distribution is also devised. The numerical method in phase space combines the discrete-ordinate method in momentum space and the high-resolution shock capturing method in physical space. Numerical solutions of two-dimensional Riemann problems for two configurations covering various degrees of rarefaction are presented and various contours of the quantities unique to this new model are illustrated. When the relaxation time becomes very small, the main flow features a display similar to that of ideal quantum gas dynamics, and the present solutions are found to be consistent with existing calculations for classical gas. The effect of a parameter that permits an adjustable Prandtl number in the flow is also studied. PMID:25104904
Level set method for image segmentation based on moment competition
NASA Astrophysics Data System (ADS)
Min, Hai; Wang, Xiao-Feng; Huang, De-Shuang; Jin, Jing; Wang, Hong-Zhi; Li, Hai
2015-05-01
We propose a level set method for image segmentation which introduces the moment competition and weakly supervised information into the energy functional construction. Different from the region-based level set methods which use force competition, the moment competition is adopted to drive the contour evolution. Here, a so-called three-point labeling scheme is proposed to manually label three independent points (weakly supervised information) on the image. Then the intensity differences between the three points and the unlabeled pixels are used to construct the force arms for each image pixel. The corresponding force is generated from the global statistical information of a region-based method and weighted by the force arm. As a result, the moment can be constructed and incorporated into the energy functional to drive the evolving contour to approach the object boundary. In our method, the force arm can take full advantage of the three-point labeling scheme to constrain the moment competition. Additionally, the global statistical information and weakly supervised information are successfully integrated, which makes the proposed method more robust than traditional methods for initial contour placement and parameter setting. Experimental results with performance analysis also show the superiority of the proposed method on segmenting different types of complicated images, such as noisy images, three-phase images, images with intensity inhomogeneity, and texture images.
NASA Astrophysics Data System (ADS)
Mousavi Anzehaee, Mohammad; Adib, Ahmad; Heydarzadeh, Kobra
2015-10-01
The manner of microtremor data collection and filtering operation and also the method used for processing have a considerable effect on the accuracy of estimation of dynamic soil parameters. In this paper, running variance method was used to improve the automatic detection of data sections infected by local perturbations. In this method, the microtremor data running variance is computed using a sliding window. Then the obtained signal is used to remove the ranges of data affected by perturbations from the original data. Additionally, to determinate the fundamental frequency of a site, this study has proposed a statistical characteristics-based method. Actually, statistical characteristics, such as the probability density graph and the average and the standard deviation of all the frequencies corresponding to the maximum peaks in the H/ V spectra of all data windows, are used to differentiate the real peaks from the false peaks resulting from perturbations. The methods have been applied to the data recorded for the City of Meybod in central Iran. Experimental results show that the applied methods are able to successfully reduce the effects of extensive local perturbations on microtremor data and eventually to estimate the fundamental frequency more accurately compared to other common methods.
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
Statistics of baryon correlation functions in lattice QCD
NASA Astrophysics Data System (ADS)
Wagman, Michael L.; Savage, Martin J.; Nplqcd Collaboration
2017-12-01
A systematic analysis of the structure of single-baryon correlation functions calculated with lattice QCD is performed, with a particular focus on characterizing the structure of the noise associated with quantum fluctuations. The signal-to-noise problem in these correlation functions is shown, as long suspected, to result from a sign problem. The log-magnitude and complex phase are found to be approximately described by normal and wrapped normal distributions respectively. Properties of circular statistics are used to understand the emergence of a large time noise region where standard energy measurements are unreliable. Power-law tails in the distribution of baryon correlation functions, associated with stable distributions and "Lévy flights," are found to play a central role in their time evolution. A new method of analyzing correlation functions is considered for which the signal-to-noise ratio of energy measurements is constant, rather than exponentially degrading, with increasing source-sink separation time. This new method includes an additional systematic uncertainty that can be removed by performing an extrapolation, and the signal-to-noise problem reemerges in the statistics of this extrapolation. It is demonstrated that this new method allows accurate results for the nucleon mass to be extracted from the large-time noise region inaccessible to standard methods. The observations presented here are expected to apply to quantum Monte Carlo calculations more generally. Similar methods to those introduced here may lead to practical improvements in analysis of noisier systems.
Automated detection of hospital outbreaks: A systematic review of methods.
Leclère, Brice; Buckeridge, David L; Boëlle, Pierre-Yves; Astagneau, Pascal; Lepelletier, Didier
2017-01-01
Several automated algorithms for epidemiological surveillance in hospitals have been proposed. However, the usefulness of these methods to detect nosocomial outbreaks remains unclear. The goal of this review was to describe outbreak detection algorithms that have been tested within hospitals, consider how they were evaluated, and synthesize their results. We developed a search query using keywords associated with hospital outbreak detection and searched the MEDLINE database. To ensure the highest sensitivity, no limitations were initially imposed on publication languages and dates, although we subsequently excluded studies published before 2000. Every study that described a method to detect outbreaks within hospitals was included, without any exclusion based on study design. Additional studies were identified through citations in retrieved studies. Twenty-nine studies were included. The detection algorithms were grouped into 5 categories: simple thresholds (n = 6), statistical process control (n = 12), scan statistics (n = 6), traditional statistical models (n = 6), and data mining methods (n = 4). The evaluation of the algorithms was often solely descriptive (n = 15), but more complex epidemiological criteria were also investigated (n = 10). The performance measures varied widely between studies: e.g., the sensitivity of an algorithm in a real world setting could vary between 17 and 100%. Even if outbreak detection algorithms are useful complementary tools for traditional surveillance, the heterogeneity in results among published studies does not support quantitative synthesis of their performance. A standardized framework should be followed when evaluating outbreak detection methods to allow comparison of algorithms across studies and synthesis of results.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Mitigating the impact of the DESI fiber assignment on galaxy clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burden, Angela; Padmanabhan, Nikhil; Cahn, Robert N.
2017-03-01
We present a simple strategy to mitigate the impact of an incomplete spectroscopic redshift galaxy sample as a result of fiber assignment and survey tiling. The method has been designed for the Dark Energy Spectroscopic Instrument (DESI) galaxy survey but may have applications beyond this. We propose a modification to the usual correlation function that nulls the almost purely angular modes affected by survey incompleteness due to fiber assignment. Predictions of this modified statistic can be calculated given a model of the two point correlation function. The new statistic can be computed with a slight modification to the data cataloguesmore » input to the standard correlation function code and does not incur any additional computational time. Finally we show that the spherically averaged baryon acoustic oscillation signal is not biased by the new statistic.« less
Saadati, Farzaneh; Ahmad Tarmizi, Rohani; Mohd Ayub, Ahmad Fauzi; Abu Bakar, Kamariah
2015-01-01
Because students' ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is 'value added' because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students' problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students.
Gray, Alistair; Veale, Jaimie F.; Binson, Diane; Sell, Randell L.
2013-01-01
Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand's Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens. PMID:23840231
Kernel-based whole-genome prediction of complex traits: a review.
Morota, Gota; Gianola, Daniel
2014-01-01
Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
Guisan, Antoine; Edwards, T.C.; Hastie, T.
2002-01-01
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.
Robust Strategy for Rocket Engine Health Monitoring
NASA Technical Reports Server (NTRS)
Santi, L. Michael
2001-01-01
Monitoring the health of rocket engine systems is essentially a two-phase process. The acquisition phase involves sensing physical conditions at selected locations, converting physical inputs to electrical signals, conditioning the signals as appropriate to establish scale or filter interference, and recording results in a form that is easy to interpret. The inference phase involves analysis of results from the acquisition phase, comparison of analysis results to established health measures, and assessment of health indications. A variety of analytical tools may be employed in the inference phase of health monitoring. These tools can be separated into three broad categories: statistical, rule based, and model based. Statistical methods can provide excellent comparative measures of engine operating health. They require well-characterized data from an ensemble of "typical" engines, or "golden" data from a specific test assumed to define the operating norm in order to establish reliable comparative measures. Statistical methods are generally suitable for real-time health monitoring because they do not deal with the physical complexities of engine operation. The utility of statistical methods in rocket engine health monitoring is hindered by practical limits on the quantity and quality of available data. This is due to the difficulty and high cost of data acquisition, the limited number of available test engines, and the problem of simulating flight conditions in ground test facilities. In addition, statistical methods incur a penalty for disregarding flow complexity and are therefore limited in their ability to define performance shift causality. Rule based methods infer the health state of the engine system based on comparison of individual measurements or combinations of measurements with defined health norms or rules. This does not mean that rule based methods are necessarily simple. Although binary yes-no health assessment can sometimes be established by relatively simple rules, the causality assignment needed for refined health monitoring often requires an exceptionally complex rule base involving complicated logical maps. Structuring the rule system to be clear and unambiguous can be difficult, and the expert input required to maintain a large logic network and associated rule base can be prohibitive.
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Weighted regularized statistical shape space projection for breast 3D model reconstruction.
Ruiz, Guillermo; Ramon, Eduard; García, Jaime; Sukno, Federico M; Ballester, Miguel A González
2018-07-01
The use of 3D imaging has increased as a practical and useful tool for plastic and aesthetic surgery planning. Specifically, the possibility of representing the patient breast anatomy in a 3D shape and simulate aesthetic or plastic procedures is a great tool for communication between surgeon and patient during surgery planning. For the purpose of obtaining the specific 3D model of the breast of a patient, model-based reconstruction methods can be used. In particular, 3D morphable models (3DMM) are a robust and widely used method to perform 3D reconstruction. However, if additional prior information (i.e., known landmarks) is combined with the 3DMM statistical model, shape constraints can be imposed to improve the 3DMM fitting accuracy. In this paper, we present a framework to fit a 3DMM of the breast to two possible inputs: 2D photos and 3D point clouds (scans). Our method consists in a Weighted Regularized (WR) projection into the shape space. The contribution of each point in the 3DMM shape is weighted allowing to assign more relevance to those points that we want to impose as constraints. Our method is applied at multiple stages of the 3D reconstruction process. Firstly, it can be used to obtain a 3DMM initialization from a sparse set of 3D points. Additionally, we embed our method in the 3DMM fitting process in which more reliable or already known 3D points or regions of points, can be weighted in order to preserve their shape information. The proposed method has been tested in two different input settings: scans and 2D pictures assessing both reconstruction frameworks with very positive results. Copyright © 2018 Elsevier B.V. All rights reserved.
Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas
2015-01-01
Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Statistical Analyses of Raw Material Data for MTM45-1/CF7442A-36% RW: CMH Cure Cycle
NASA Technical Reports Server (NTRS)
Coroneos, Rula; Pai, Shantaram, S.; Murthy, Pappu
2013-01-01
This report describes statistical characterization of physical properties of the composite material system MTM45-1/CF7442A, which has been tested and is currently being considered for use on spacecraft structures. This composite system is made of 6K plain weave graphite fibers in a highly toughened resin system. This report summarizes the distribution types and statistical details of the tests and the conditions for the experimental data generated. These distributions will be used in multivariate regression analyses to help determine material and design allowables for similar material systems and to establish a procedure for other material systems. Additionally, these distributions will be used in future probabilistic analyses of spacecraft structures. The specific properties that are characterized are the ultimate strength, modulus, and Poisson??s ratio by using a commercially available statistical package. Results are displayed using graphical and semigraphical methods and are included in the accompanying appendixes.
Quantile regression for the statistical analysis of immunological data with many non-detects.
Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth
2012-07-07
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Automated detection of hospital outbreaks: A systematic review of methods
Buckeridge, David L.; Lepelletier, Didier
2017-01-01
Objectives Several automated algorithms for epidemiological surveillance in hospitals have been proposed. However, the usefulness of these methods to detect nosocomial outbreaks remains unclear. The goal of this review was to describe outbreak detection algorithms that have been tested within hospitals, consider how they were evaluated, and synthesize their results. Methods We developed a search query using keywords associated with hospital outbreak detection and searched the MEDLINE database. To ensure the highest sensitivity, no limitations were initially imposed on publication languages and dates, although we subsequently excluded studies published before 2000. Every study that described a method to detect outbreaks within hospitals was included, without any exclusion based on study design. Additional studies were identified through citations in retrieved studies. Results Twenty-nine studies were included. The detection algorithms were grouped into 5 categories: simple thresholds (n = 6), statistical process control (n = 12), scan statistics (n = 6), traditional statistical models (n = 6), and data mining methods (n = 4). The evaluation of the algorithms was often solely descriptive (n = 15), but more complex epidemiological criteria were also investigated (n = 10). The performance measures varied widely between studies: e.g., the sensitivity of an algorithm in a real world setting could vary between 17 and 100%. Conclusion Even if outbreak detection algorithms are useful complementary tools for traditional surveillance, the heterogeneity in results among published studies does not support quantitative synthesis of their performance. A standardized framework should be followed when evaluating outbreak detection methods to allow comparison of algorithms across studies and synthesis of results. PMID:28441422
NASA Astrophysics Data System (ADS)
Lazoglou, Georgia; Anagnostopoulou, Christina; Tolika, Konstantia; Kolyva-Machera, Fotini
2018-04-01
The increasing trend of the intensity and frequency of temperature and precipitation extremes during the past decades has substantial environmental and socioeconomic impacts. Thus, the objective of the present study is the comparison of several statistical methods of the extreme value theory (EVT) in order to identify which is the most appropriate to analyze the behavior of the extreme precipitation, and high and low temperature events, in the Mediterranean region. The extremes choice was made using both the block maxima and the peaks over threshold (POT) technique and as a consequence both the generalized extreme value (GEV) and generalized Pareto distributions (GPDs) were used to fit them. The results were compared, in order to select the most appropriate distribution for extremes characterization. Moreover, this study evaluates the maximum likelihood estimation, the L-moments and the Bayesian method, based on both graphical and statistical goodness-of-fit tests. It was revealed that the GPD can characterize accurately both precipitation and temperature extreme events. Additionally, GEV distribution with the Bayesian method is proven to be appropriate especially for the greatest values of extremes. Another important objective of this investigation was the estimation of the precipitation and temperature return levels for three return periods (50, 100, and 150 years) classifying the data into groups with similar characteristics. Finally, the return level values were estimated with both GEV and GPD and with the three different estimation methods, revealing that the selected method can affect the return level values for both the parameter of precipitation and temperature.
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J
2017-05-01
Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.
A comparative study of additive and subtractive manufacturing for dental restorations.
Bae, Eun-Jeong; Jeong, Il-Do; Kim, Woong-Chul; Kim, Ji-Hwan
2017-08-01
Digital systems have recently found widespread application in the fabrication of dental restorations. For the clinical assessment of dental restorations fabricated digitally, it is necessary to evaluate their accuracy. However, studies of the accuracy of inlay restorations fabricated with additive manufacturing are lacking. The purpose of this in vitro study was to evaluate and compare the accuracy of inlay restorations fabricated by using recently introduced additive manufacturing with the accuracy of subtractive methods. The inlay (distal occlusal cavity) shape was fabricated using 3-dimensional image (reference data) software. Specimens were fabricated using 4 different methods (each n=10, total N=40), including 2 additive manufacturing methods, stereolithography apparatus and selective laser sintering; and 2 subtractive methods, wax and zirconia milling. Fabricated specimens were scanned using a dental scanner and then compared by overlapping reference data. The results were statistically analyzed using a 1-way analysis of variance (α=.05). Additionally, the surface morphology of 1 randomly (the first of each specimen) selected specimen from each group was evaluated using a digital microscope. The results of the overlap analysis of the dental restorations indicated that the root mean square (RMS) deviation observed in the restorations fabricated using the additive manufacturing methods were significantly different from those fabricated using the subtractive methods (P<.05). However, no significant differences were found between restorations fabricated using stereolithography apparatus and selective laser sintering, the additive manufacturing methods (P=.466). Similarly, no significant differences were found between wax and zirconia, the subtractive methods (P=.986). The observed RMS values were 106 μm for stereolithography apparatus, 113 μm for selective laser sintering, 116 μm for wax, and 119 μm for zirconia. Microscopic evaluation of the surface revealed a fine linear gap between the layers of restorations fabricated using stereolithography apparatus and a grooved hole with inconsistent weak scratches when fabricated using selective laser sintering. In the wax and zirconia restorations, possible traces of milling bur passes were observed. The results indicate that the accuracy of dental restorations fabricated using the additive manufacturing methods is higher than that of subtractive methods. Therefore, additive manufacturing methods are a viable alternative to subtractive methods. Copyright © 2016 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Compensation and additivity of anthropogenic mortality: life-history effects and review of methods.
Péron, Guillaume
2013-03-01
Demographic compensation, the increase in average individual performance following a perturbation that reduces population size, and, its opposite, demographic overadditivity (or superadditivity) are central processes in both population ecology and wildlife management. A continuum of population responses to changes in cause-specific mortality exists, of which additivity and complete compensation constitute particular points. The position of a population on that continuum influences its ability to sustain exploitation and predation. Here I describe a method for quantifying where a population is on the continuum. Based on variance-covariance formulae, I describe a simple metric for the rate of compensation-additivity. I synthesize the results from 10 wildlife capture-recapture monitoring programmes from the literature and online databases, reviewing current statistical methods and the treatment of common sources of bias. These results are used to test hypotheses regarding the effects of life-history strategy, population density, average cause-specific mortality and age class on the rate of compensation-additivity. This comparative analysis highlights that long-lived species compensate less than short-lived species and that populations below their carrying capacity compensate less than those above. © 2012 The Authors. Journal of Animal Ecology © 2012 British Ecological Society.
Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.
Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga
2015-10-01
The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.
Jin, Hong-Ying; Li, Da-Wei; Zhang, Na; Gu, Zhen; Long, Yi-Tao
2015-06-10
We demonstrated a practical method to analyze carbohydrate-protein interaction based on single plasmonic nanoparticles by conventional dark field microscopy (DFM). Protein concanavalin A (ConA) was modified on large sized gold nanoparticles (AuNPs), and dextran was conjugated on small sized AuNPs. As the interaction between ConA and dextran resulted in two kinds of gold nanoparticles coupled together, which caused coupling of plasmonic oscillations, apparent color changes (from green to yellow) of the single AuNPs were observed through DFM. Then, the color information was instantly transformed into a statistic peak wavelength distribution in less than 1 min by a self-developed statistical program (nanoparticleAnalysis). In addition, the interaction between ConA and dextran was proved with biospecific recognition. This approach is high-throughput and real-time, and is a convenient method to analyze carbohydrate-protein interaction at the single nanoparticle level efficiently.
A Method for Retrieving Ground Flash Fraction from Satellite Lightning Imager Data
NASA Technical Reports Server (NTRS)
Koshak, William J.
2009-01-01
A general theory for retrieving the fraction of ground flashes in N lightning observed by a satellite-based lightning imager is provided. An "exponential model" is applied as a physically reasonable constraint to describe the measured optical parameter distributions, and population statistics (i.e., mean, variance) are invoked to add additional constraints to the retrieval process. The retrieval itself is expressed in terms of a Bayesian inference, and the Maximum A Posteriori (MAP) solution is obtained. The approach is tested by performing simulated retrievals, and retrieval error statistics are provided. The ability to retrieve ground flash fraction has important benefits to the atmospheric chemistry community. For example, using the method to partition the existing satellite global lightning climatology into separate ground and cloud flash climatologies will improve estimates of lightning nitrogen oxides (NOx) production; this in turn will improve both regional air quality and global chemistry/climate model predictions.
Mallette, Jennifer R; Casale, John F; Jordan, James; Morello, David R; Beyer, Paul M
2016-03-23
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses ((2)H and (18)O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.
NASA Astrophysics Data System (ADS)
Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.
2016-03-01
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.
Vargas-Rodriguez, Everardo; Guzman-Chavez, Ana Dinora; Baeza-Serrato, Roberto
2018-06-04
In this work, a novel tailored algorithm to enhance the overall sensitivity of gas concentration sensors based on the Direct Absorption Tunable Laser Absorption Spectroscopy (DA-ATLAS) method is presented. By using this algorithm, the sensor sensitivity can be custom-designed to be quasi constant over a much larger dynamic range compared with that obtained by typical methods based on a single statistics feature of the sensor signal output (peak amplitude, area under the curve, mean or RMS). Additionally, it is shown that with our algorithm, an optimal function can be tailored to get a quasi linear relationship between the concentration and some specific statistics features over a wider dynamic range. In order to test the viability of our algorithm, a basic C 2 H 2 sensor based on DA-ATLAS was implemented, and its experimental measurements support the simulated results provided by our algorithm.
Ridge, Lasso and Bayesian additive-dominance genomic models.
Azevedo, Camila Ferreira; de Resende, Marcos Deon Vilela; E Silva, Fabyano Fonseca; Viana, José Marcelo Soriano; Valente, Magno Sávio Ferreira; Resende, Márcio Fernando Ribeiro; Muñoz, Patricio
2015-08-25
A complete approach for genome-wide selection (GWS) involves reliable statistical genetics models and methods. Reports on this topic are common for additive genetic models but not for additive-dominance models. The objective of this paper was (i) to compare the performance of 10 additive-dominance predictive models (including current models and proposed modifications), fitted using Bayesian, Lasso and Ridge regression approaches; and (ii) to decompose genomic heritability and accuracy in terms of three quantitative genetic information sources, namely, linkage disequilibrium (LD), co-segregation (CS) and pedigree relationships or family structure (PR). The simulation study considered two broad sense heritability levels (0.30 and 0.50, associated with narrow sense heritabilities of 0.20 and 0.35, respectively) and two genetic architectures for traits (the first consisting of small gene effects and the second consisting of a mixed inheritance model with five major genes). G-REML/G-BLUP and a modified Bayesian/Lasso (called BayesA*B* or t-BLASSO) method performed best in the prediction of genomic breeding as well as the total genotypic values of individuals in all four scenarios (two heritabilities x two genetic architectures). The BayesA*B*-type method showed a better ability to recover the dominance variance/additive variance ratio. Decomposition of genomic heritability and accuracy revealed the following descending importance order of information: LD, CS and PR not captured by markers, the last two being very close. Amongst the 10 models/methods evaluated, the G-BLUP, BAYESA*B* (-2,8) and BAYESA*B* (4,6) methods presented the best results and were found to be adequate for accurately predicting genomic breeding and total genotypic values as well as for estimating additive and dominance in additive-dominance genomic models.
Cremeens, Joanne; Eiser, Christine; Blades, Mark
2006-08-30
In situations where children are unable or unwilling to respond for themselves, measurement of quality of life (QOL) is often obtained by parent proxy-report. However the relationship between child self and parent proxy-reports has been shown to be poor in some circumstances. Additionally the most appropriate statistical method for comparing ratings between child and parent proxy-reports has not been clearly established. The objectives of this study were to assess the: 1) agreement between child and parent proxy-reports on an established child QOL measure (the PedsQL) using two different statistical methods; 2) effect of chronological age and domain type on agreement between children's and parents' reports on the PedsQL; 3) relationship between parents' own well-being and their ratings of their child's QOL. One hundred and forty-nine healthy children (5.5 - 6.5, 6.5 - 7.5, and 7.5 - 8.5 years) completed the PedsQL. One hundred and three of their parents completed these measures in relation to their child, and a measure of their own QOL (SF-36). Consistency between child and parent proxy-reports on the PedsQL was low, with Intra-Class correlation coefficients ranging from 0.02 to 0.23. Correlations were higher for the oldest age group for Total Score and Psychosocial Health domains, and for the Physical Health domain in the youngest age group. Statistically significant median differences were found between child and parent-reports on all subscales of the PedsQL. The largest median differences were found for the two older age groups. Statistically significant correlations were found between parents' own QOL and their proxy-reports of child QOL across the total sample and within the middle age group. Intra-Class correlation coefficients and median difference testing can provide different information on the relationship between parent proxy-reports and child self-reports. Our findings suggest that differences in the levels of parent-child agreement previously reported may be an artefact of the statistical method used. In addition, levels of agreement can be affected by child age, domains investigated, and parents' own QOL. Further studies are needed to establish the optimal predictors of levels of parent-child agreement.
Adaptive Error Estimation in Linearized Ocean General Circulation Models
NASA Technical Reports Server (NTRS)
Chechelnitsky, Michael Y.
1999-01-01
Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.
NASA Technical Reports Server (NTRS)
Hughitt, Brian; Generazio, Edward (Principal Investigator); Nichols, Charles; Myers, Mika (Principal Investigator); Spencer, Floyd (Principal Investigator); Waller, Jess (Principal Investigator); Wladyka, Jordan (Principal Investigator); Aldrin, John; Burke, Eric; Cerecerez, Laura;
2016-01-01
NASA-STD-5009 requires that successful flaw detection by NDE methods be statistically qualified for use on fracture critical metallic components, but does not standardize practices. This task works towards standardizing calculations and record retention with a web-based tool, the NNWG POD Standards Library or NPSL. Test methods will also be standardized with an appropriately flexible appendix to -5009 identifying best practices. Additionally, this appendix will describe how specimens used to qualify NDE systems will be cataloged, stored and protected from corrosion, damage, or loss.
NRC TLD Direct Radiation Monitoring Network. Progress report, October--December 1996
DOE Office of Scientific and Technical Information (OSTI.GOV)
Struckmeyer, R.
This report presents the results of the NRC Direct Radiation Monitoring Network for the fourth quarter of 1996. It provides the ambient radiation levels measured in the vicinity of 74 sites throughout the United States. In addition, it describes the equipment used, monitoring station selection criteria, characterization of the dosimeter response, calibration procedures, statistical methods, intercomparison, and quality assurance program. 3 figs., 4 tabs.
Iarosh, O A; Iarosh, A A
1991-01-01
As many as 300 patients of different age groups underwent a probability statistical analysis of cytosis and CSF protein depending on the outcome of bacterial meningoencephalitis. The clinical and CSF interrelations discovered reflect the function of the blood-brain barrier and can be used as an additional test for predicting the disease outcome.
Statistics of rain-rate estimates for a single attenuating radar
NASA Technical Reports Server (NTRS)
Meneghini, R.
1976-01-01
The effects of fluctuations in return power and the rain-rate/reflectivity relationship, are included in the estimates, as well as errors introduced in the attempt to recover the unattenuated return power. In addition to the Hitschfeld-Bordan correction, two alternative techniques are considered. The performance of the radar is shown to be dependent on the method by which attenuation correction is made.
Modeling and Recovery of Iron (Fe) from Red Mud by Coal Reduction
NASA Astrophysics Data System (ADS)
Zhao, Xiancong; Li, Hongxu; Wang, Lei; Zhang, Lifeng
Recovery of Fe from red mud has been studied using statistically designed experiments. The effects of three factors, namely: reduction temperature, reduction time and proportion of additive on recovery of Fe have been investigated. Experiments have been carried out using orthogonal central composite design and factorial design methods. A model has been obtained through variance analysis at 92.5% confidence level.
Wang, Ling-jia; Kissler, Hermann J; Wang, Xiaojun; Cochet, Olivia; Krzystyniak, Adam; Misawa, Ryosuke; Golab, Karolina; Tibudan, Martin; Grzanka, Jakub; Savari, Omid; Grose, Randall; Kaufman, Dixon B; Millis, Michael; Witkowski, Piotr
2015-01-01
Pancreatic islet mass, represented by islet equivalent (IEQ), is the most important parameter in decision making for clinical islet transplantation. To obtain IEQ, the sample of islets is routinely counted manually under a microscope and discarded thereafter. Islet purity, another parameter in islet processing, is routinely acquired by estimation only. In this study, we validated our digital image analysis (DIA) system developed using the software of Image Pro Plus for islet mass and purity assessment. Application of the DIA allows to better comply with current good manufacturing practice (cGMP) standards. Human islet samples were captured as calibrated digital images for the permanent record. Five trained technicians participated in determination of IEQ and purity by manual counting method and DIA. IEQ count showed statistically significant correlations between the manual method and DIA in all sample comparisons (r >0.819 and p < 0.0001). Statistically significant difference in IEQ between both methods was found only in High purity 100μL sample group (p = 0.029). As far as purity determination, statistically significant differences between manual assessment and DIA measurement was found in High and Low purity 100μL samples (p<0.005), In addition, islet particle number (IPN) and the IEQ/IPN ratio did not differ statistically between manual counting method and DIA. In conclusion, the DIA used in this study is a reliable technique in determination of IEQ and purity. Islet sample preserved as a digital image and results produced by DIA can be permanently stored for verification, technical training and islet information exchange between different islet centers. Therefore, DIA complies better with cGMP requirements than the manual counting method. We propose DIA as a quality control tool to supplement the established standard manual method for islets counting and purity estimation. PMID:24806436
NASA Astrophysics Data System (ADS)
Buchhave, Preben; Velte, Clara M.
2017-08-01
We present a method for converting a time record of turbulent velocity measured at a point in a flow to a spatial velocity record consisting of consecutive convection elements. The spatial record allows computation of dynamic statistical moments such as turbulent kinetic wavenumber spectra and spatial structure functions in a way that completely bypasses the need for Taylor's hypothesis. The spatial statistics agree with the classical counterparts, such as the total kinetic energy spectrum, at least for spatial extents up to the Taylor microscale. The requirements for applying the method are access to the instantaneous velocity magnitude, in addition to the desired flow quantity, and a high temporal resolution in comparison to the relevant time scales of the flow. We map, without distortion and bias, notoriously difficult developing turbulent high intensity flows using three main aspects that distinguish these measurements from previous work in the field: (1) The measurements are conducted using laser Doppler anemometry and are therefore not contaminated by directional ambiguity (in contrast to, e.g., frequently employed hot-wire anemometers); (2) the measurement data are extracted using a correctly and transparently functioning processor and are analysed using methods derived from first principles to provide unbiased estimates of the velocity statistics; (3) the exact mapping proposed herein has been applied to the high turbulence intensity flows investigated to avoid the significant distortions caused by Taylor's hypothesis. The method is first confirmed to produce the correct statistics using computer simulations and later applied to measurements in some of the most difficult regions of a round turbulent jet—the non-equilibrium developing region and the outermost parts of the developed jet. The proposed mapping is successfully validated using corresponding directly measured spatial statistics in the fully developed jet, even in the difficult outer regions of the jet where the average convection velocity is negligible and turbulence intensities increase dramatically. The measurements in the developing region reveal interesting features of an incomplete Richardson-Kolmogorov cascade under development.
NASA Astrophysics Data System (ADS)
Eum, H. I.; Cannon, A. J.
2015-12-01
Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.
Shivakumarswamy, Udasimath; Arakeri, Surekha U; Karigowdar, Mahesh H; Yelikar, BR
2012-01-01
Background: The cytological examinations of serous effusions have been well-accepted, and a positive diagnosis is often considered as a definitive diagnosis. It helps in staging, prognosis and management of the patients in malignancies and also gives information about various inflammatory and non-inflammatory lesions. Diagnostic problems arise in everyday practice to differentiate reactive atypical mesothelial cells and malignant cells by the routine conventional smear (CS) method. Aims: To compare the morphological features of the CS method with those of the cell block (CB) method and also to assess the utility and sensitivity of the CB method in the cytodiagnosis of pleural effusions. Materials and Methods: The study was conducted in the cytology section of the Department of Pathology. Sixty pleural fluid samples were subjected to diagnostic evaluation for over a period of 20 months. Along with the conventional smears, cell blocks were prepared by using 10% alcohol–formalin as a fixative agent. Statistical analysis with the ‘z test’ was performed to identify the cellularity, using the CS and CB methods. Mc. Naemer's χ2test was used to identify the additional yield for malignancy by the CB method. Results: Cellularity and additional yield for malignancy was 15% more by the CB method. Conclusions: The CB method provides high cellularity, better architectural patterns, morphological features and an additional yield of malignant cells, and thereby, increases the sensitivity of the cytodiagnosis when compared with the CS method. PMID:22438610
Alsaggaf, Rotana; O'Hara, Lyndsay M; Stafford, Kristen A; Leekha, Surbhi; Harris, Anthony D
2018-02-01
OBJECTIVE A systematic review of quasi-experimental studies in the field of infectious diseases was published in 2005. The aim of this study was to assess improvements in the design and reporting of quasi-experiments 10 years after the initial review. We also aimed to report the statistical methods used to analyze quasi-experimental data. DESIGN Systematic review of articles published from January 1, 2013, to December 31, 2014, in 4 major infectious disease journals. METHODS Quasi-experimental studies focused on infection control and antibiotic resistance were identified and classified based on 4 criteria: (1) type of quasi-experimental design used, (2) justification of the use of the design, (3) use of correct nomenclature to describe the design, and (4) statistical methods used. RESULTS Of 2,600 articles, 173 (7%) featured a quasi-experimental design, compared to 73 of 2,320 articles (3%) in the previous review (P<.01). Moreover, 21 articles (12%) utilized a study design with a control group; 6 (3.5%) justified the use of a quasi-experimental design; and 68 (39%) identified their design using the correct nomenclature. In addition, 2-group statistical tests were used in 75 studies (43%); 58 studies (34%) used standard regression analysis; 18 (10%) used segmented regression analysis; 7 (4%) used standard time-series analysis; 5 (3%) used segmented time-series analysis; and 10 (6%) did not utilize statistical methods for comparisons. CONCLUSIONS While some progress occurred over the decade, it is crucial to continue improving the design and reporting of quasi-experimental studies in the fields of infection control and antibiotic resistance to better evaluate the effectiveness of important interventions. Infect Control Hosp Epidemiol 2018;39:170-176.
Crans, Gerald G; Shuster, Jonathan J
2008-08-15
The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha(*) = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha(*) instead of alpha) in the two-sample comparative binomial trial. 2008 John Wiley & Sons, Ltd
Reconstruction of early phase deformations by integrated magnetic and mesotectonic data evaluation
NASA Astrophysics Data System (ADS)
Sipos, András A.; Márton, Emő; Fodor, László
2018-02-01
Markers of brittle faulting are widely used for recovering past deformation phases. Rocks often have oriented magnetic fabrics, which can be interpreted as connected to ductile deformation before cementation of the sediment. This paper reports a novel statistical procedure for simultaneous evaluation of AMS (Anisotropy of Magnetic Susceptibility) and fault-slip data. The new method analyzes the AMS data, without linearization techniques, so that weak AMS lineation and rotational AMS can be assessed that are beyond the scope of classical methods. This idea is extended to the evaluation of fault-slip data. While the traditional assumptions of stress inversion are not rejected, the method recovers the stress field via statistical hypothesis testing. In addition it provides statistical information needed for the combined evaluation of the AMS and the mesotectonic (0.1 to 10 m) data. In the combined evaluation a statistical test is carried out that helps to decide if the AMS lineation and the mesotectonic markers (in case of repeated deformation of the oldest set of markers) were formed in the same or different deformation phases. If this condition is met, the combined evaluation can improve the precision of the reconstruction. When the two data sets do not have a common solution for the direction of the extension, the deformational origin of the AMS is questionable. In this case the orientation of the stress field responsible for the AMS lineation might be different from that which caused the brittle deformation. Although most of the examples demonstrate the reconstruction of weak deformations in sediments, the new method is readily applicable to investigate the ductile-brittle transition of any rock formation as long as AMS and fault-slip data are available.
Resch, Stephen
2018-01-01
Objectives: In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. Methods: We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. Results: A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of delivering established health programs but also to decrease uncertainty when the interest lies in assessing the incremental effect of an intervention. Conclusion: Measures of statistical uncertainty associated with survey-based estimates of program costs, such as standard errors and 95% confidence intervals, provide important contextual information for health policy decision-making and key inputs for the design of future costing studies. Such measures are often not reported, possibly because of technical challenges associated with their calculation and a lack of awareness of appropriate software. Modern statistical analysis methods for survey data, such as calibration, provide a means to exploit additional information that is readily available but was not used in the design of the study to significantly improve the estimation of total cost through the reduction of statistical uncertainty. PMID:29636964
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS).
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M; Khan, Ajmal
2016-01-01
This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher's exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher's exact test, logistic regression, epidemiological statistics, and non-parametric tests. This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design.
Effects of additional data on Bayesian clustering.
Yamazaki, Keisuke
2017-10-01
Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
External Threat Risk Assessment Algorithm (ExTRAA)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Powell, Troy C.
Two risk assessment algorithms and philosophies have been augmented and combined to form a new algorit hm, the External Threat Risk Assessment Algorithm (ExTRAA), that allows for effective and statistically sound analysis of external threat sources in relation to individual attack methods . In addition to the attack method use probability and the attack method employment consequence, t he concept of defining threat sources is added to the risk assessment process. Sample data is tabulated and depicted in radar plots and bar graphs for algorithm demonstration purposes. The largest success of ExTRAA is its ability to visualize the kind ofmore » r isk posed in a given situation using the radar plot method.« less
Determination of caffeic acid in wine using PEDOT film modified electrode.
Bianchini, C; Curulli, A; Pasquali, M; Zane, D
2014-08-01
A novel method using PEDOT (poly(3,4-ethylenedioxy) thiophene) modified electrode was developed for the determination of caffeic acid (CA) in wine. Cyclic voltammetry (CV) with the additions standard method was used to quantify the analyte at PEDOT modified electrodes. PEDOT films were electrodeposited on Platinum electrode (Pt) in aqueous medium by galvanostatic method using sodium poly(styrene-4-sulfonate) (PSS) as electrolyte and surfactant. CV allows detecting the analyte over a wide concentration range (10.0nmoll(-1)-6.5mmoll(-1)). The electrochemical method proposed showed good statistical and analytical parameters as linearity range, LOD, LOQ and sensitivity. Copyright © 2014 Elsevier Ltd. All rights reserved.
Bayesian methods for outliers detection in GNSS time series
NASA Astrophysics Data System (ADS)
Qianqian, Zhang; Qingming, Gui
2013-07-01
This article is concerned with the problem of detecting outliers in GNSS time series based on Bayesian statistical theory. Firstly, a new model is proposed to simultaneously detect different types of outliers based on the conception of introducing different types of classification variables corresponding to the different types of outliers; the problem of outlier detection is converted into the computation of the corresponding posterior probabilities, and the algorithm for computing the posterior probabilities based on standard Gibbs sampler is designed. Secondly, we analyze the reasons of masking and swamping about detecting patches of additive outliers intensively; an unmasking Bayesian method for detecting additive outlier patches is proposed based on an adaptive Gibbs sampler. Thirdly, the correctness of the theories and methods proposed above is illustrated by simulated data and then by analyzing real GNSS observations, such as cycle slips detection in carrier phase data. Examples illustrate that the Bayesian methods for outliers detection in GNSS time series proposed by this paper are not only capable of detecting isolated outliers but also capable of detecting additive outlier patches. Furthermore, it can be successfully used to process cycle slips in phase data, which solves the problem of small cycle slips.
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS)
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M.; Khan, Ajmal
2016-01-01
Objective: This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Methods: Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. Results: A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher’s exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher’s exact test, logistic regression, epidemiological statistics, and non-parametric tests. Conclusion: This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design. PMID:27022365
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.
This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less
NASA Astrophysics Data System (ADS)
Äijälä, Mikko; Heikkinen, Liine; Fröhlich, Roman; Canonaco, Francesco; Prévôt, André S. H.; Junninen, Heikki; Petäjä, Tuukka; Kulmala, Markku; Worsnop, Douglas; Ehn, Mikael
2017-03-01
Mass spectrometric measurements commonly yield data on hundreds of variables over thousands of points in time. Refining and synthesizing this raw data into chemical information necessitates the use of advanced, statistics-based data analytical techniques. In the field of analytical aerosol chemistry, statistical, dimensionality reductive methods have become widespread in the last decade, yet comparable advanced chemometric techniques for data classification and identification remain marginal. Here we present an example of combining data dimensionality reduction (factorization) with exploratory classification (clustering), and show that the results cannot only reproduce and corroborate earlier findings, but also complement and broaden our current perspectives on aerosol chemical classification. We find that applying positive matrix factorization to extract spectral characteristics of the organic component of air pollution plumes, together with an unsupervised clustering algorithm, k-means+ + , for classification, reproduces classical organic aerosol speciation schemes. Applying appropriately chosen metrics for spectral dissimilarity along with optimized data weighting, the source-specific pollution characteristics can be statistically resolved even for spectrally very similar aerosol types, such as different combustion-related anthropogenic aerosol species and atmospheric aerosols with similar degree of oxidation. In addition to the typical oxidation level and source-driven aerosol classification, we were also able to classify and characterize outlier groups that would likely be disregarded in a more conventional analysis. Evaluating solution quality for the classification also provides means to assess the performance of mass spectral similarity metrics and optimize weighting for mass spectral variables. This facilitates algorithm-based evaluation of aerosol spectra, which may prove invaluable for future development of automatic methods for spectra identification and classification. Robust, statistics-based results and data visualizations also provide important clues to a human analyst on the existence and chemical interpretation of data structures. Applying these methods to a test set of data, aerosol mass spectrometric data of organic aerosol from a boreal forest site, yielded five to seven different recurring pollution types from various sources, including traffic, cooking, biomass burning and nearby sawmills. Additionally, three distinct, minor pollution types were discovered and identified as amine-dominated aerosols.
NASA Astrophysics Data System (ADS)
Nemoto, Mitsutaka; Nomura, Yukihiro; Hanaoka, Shohei; Masutani, Yoshitaka; Yoshikawa, Takeharu; Hayashi, Naoto; Yoshioka, Naoki; Ohtomo, Kuni
Anatomical point landmarks as most primitive anatomical knowledge are useful for medical image understanding. In this study, we propose a detection method for anatomical point landmark based on appearance models, which include gray-level statistical variations at point landmarks and their surrounding area. The models are built based on results of Principal Component Analysis (PCA) of sample data sets. In addition, we employed generative learning method by transforming ROI of sample data. In this study, we evaluated our method with 24 data sets of body trunk CT images and obtained 95.8 ± 7.3 % of the average sensitivity in 28 landmarks.
Evaluation of peak-picking algorithms for protein mass spectrometry.
Bauer, Chris; Cramer, Rainer; Schuchhardt, Johannes
2011-01-01
Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
Effects of Mobile Learning in Medical Education: A Counterfactual Evaluation.
Briz-Ponce, Laura; Juanes-Méndez, Juan Antonio; García-Peñalvo, Francisco José; Pereira, Anabela
2016-06-01
The aim of this research is to contribute to the general system education providing new insights and resources. This study performs a quasi-experimental study at University of Salamanca with 30 students to compare results between using an anatomic app for learning and the formal traditional method conducted by a teacher. The findings of the investigation suggest that the performance of learners using mobile apps is statistical better than the students using the traditional method. However, mobile devices should be considered as an additional tool to complement the teachers' explanation and it is necessary to overcome different barriers and challenges to adopt these pedagogical methods at University.
Statistical evaluation of synchronous spike patterns extracted by frequent item set mining
Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja
2013-01-01
We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487
Geostatistical applications in environmental remediation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, R.N.; Purucker, S.T.; Lyon, B.F.
1995-02-01
Geostatistical analysis refers to a collection of statistical methods for addressing data that vary in space. By incorporating spatial information into the analysis, geostatistics has advantages over traditional statistical analysis for problems with a spatial context. Geostatistics has a history of success in earth science applications, and its popularity is increasing in other areas, including environmental remediation. Due to recent advances in computer technology, geostatistical algorithms can be executed at a speed comparable to many standard statistical software packages. When used responsibly, geostatistics is a systematic and defensible tool can be used in various decision frameworks, such as the Datamore » Quality Objectives (DQO) process. At every point in the site, geostatistics can estimate both the concentration level and the probability or risk of exceeding a given value. Using these probability maps can assist in identifying clean-up zones. Given any decision threshold and an acceptable level of risk, the probability maps identify those areas that are estimated to be above or below the acceptable risk. Those areas that are above the threshold are of the most concern with regard to remediation. In addition to estimating clean-up zones, geostatistics can assist in designing cost-effective secondary sampling schemes. Those areas of the probability map with high levels of estimated uncertainty are areas where more secondary sampling should occur. In addition, geostatistics has the ability to incorporate soft data directly into the analysis. These data include historical records, a highly correlated secondary contaminant, or expert judgment. The role of geostatistics in environmental remediation is a tool that in conjunction with other methods can provide a common forum for building consensus.« less
Cost-Effectiveness Analysis: a proposal of new reporting standards in statistical analysis
Bang, Heejung; Zhao, Hongwei
2014-01-01
Cost-effectiveness analysis (CEA) is a method for evaluating the outcomes and costs of competing strategies designed to improve health, and has been applied to a variety of different scientific fields. Yet, there are inherent complexities in cost estimation and CEA from statistical perspectives (e.g., skewness, bi-dimensionality, and censoring). The incremental cost-effectiveness ratio that represents the additional cost per one unit of outcome gained by a new strategy has served as the most widely accepted methodology in the CEA. In this article, we call for expanded perspectives and reporting standards reflecting a more comprehensive analysis that can elucidate different aspects of available data. Specifically, we propose that mean and median-based incremental cost-effectiveness ratios and average cost-effectiveness ratios be reported together, along with relevant summary and inferential statistics as complementary measures for informed decision making. PMID:24605979
Vibration Response Models of a Stiffened Aluminum Plate Excited by a Shaker
NASA Technical Reports Server (NTRS)
Cabell, Randolph H.
2008-01-01
Numerical models of structural-acoustic interactions are of interest to aircraft designers and the space program. This paper describes a comparison between two energy finite element codes, a statistical energy analysis code, a structural finite element code, and the experimentally measured response of a stiffened aluminum plate excited by a shaker. Different methods for modeling the stiffeners and the power input from the shaker are discussed. The results show that the energy codes (energy finite element and statistical energy analysis) accurately predicted the measured mean square velocity of the plate. In addition, predictions from an energy finite element code had the best spatial correlation with measured velocities. However, predictions from a considerably simpler, single subsystem, statistical energy analysis model also correlated well with the spatial velocity distribution. The results highlight a need for further work to understand the relationship between modeling assumptions and the prediction results.
Evaluating the decision accuracy and speed of clinical data visualizations.
Pieczkiewicz, David S; Finkelstein, Stanley M
2010-01-01
Clinicians face an increasing volume of biomedical data. Assessing the efficacy of systems that enable accurate and timely clinical decision making merits corresponding attention. This paper discusses the multiple-reader multiple-case (MRMC) experimental design and linear mixed models as means of assessing and comparing decision accuracy and latency (time) for decision tasks in which clinician readers must interpret visual displays of data. These tools can assess and compare decision accuracy and latency (time). These experimental and statistical techniques, used extensively in radiology imaging studies, offer a number of practical and analytic advantages over more traditional quantitative methods such as percent-correct measurements and ANOVAs, and are recommended for their statistical efficiency and generalizability. An example analysis using readily available, free, and commercial statistical software is provided as an appendix. While these techniques are not appropriate for all evaluation questions, they can provide a valuable addition to the evaluative toolkit of medical informatics research.
NASA Technical Reports Server (NTRS)
Hoffer, R. M. (Principal Investigator); Knowlton, D. J.; Dean, M. E.
1981-01-01
A set of training statistics for the 30 meter resolution simulated thematic mapper MSS data was generated based on land use/land cover classes. In addition to this supervised data set, a nonsupervised multicluster block of training statistics is being defined in order to compare the classification results and evaluate the effect of the different training selection methods on classification performance. Two test data sets, defined using a stratified sampling procedure incorporating a grid system with dimensions of 50 lines by 50 columns, and another set based on an analyst supervised set of test fields were used to evaluate the classifications of the TMS data. The supervised training data set generated training statistics, and a per point Gaussian maximum likelihood classification of the 1979 TMS data was obtained. The August 1980 MSS data was radiometrically adjusted. The SAR data was redigitized and the SAR imagery was qualitatively analyzed.
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu
2015-01-01
Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H
2016-01-01
Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.
Direct statistical modeling and its implications for predictive mapping in mining exploration
NASA Astrophysics Data System (ADS)
Sterligov, Boris; Gumiaux, Charles; Barbanson, Luc; Chen, Yan; Cassard, Daniel; Cherkasov, Sergey; Zolotaya, Ludmila
2010-05-01
Recent advances in geosciences make more and more multidisciplinary data available for mining exploration. This allowed developing methodologies for computing forecast ore maps from the statistical combination of such different input parameters, all based on an inverse problem theory. Numerous statistical methods (e.g. algebraic method, weight of evidence, Siris method, etc) with varying degrees of complexity in their development and implementation, have been proposed and/or adapted for ore geology purposes. In literature, such approaches are often presented through applications on natural examples and the results obtained can present specificities due to local characteristics. Moreover, though crucial for statistical computations, "minimum requirements" needed for input parameters (number of minimum data points, spatial distribution of objects, etc) are often only poorly expressed. From these, problems often arise when one has to choose between one and the other method for her/his specific question. In this study, a direct statistical modeling approach is developed in order to i) evaluate the constraints on the input parameters and ii) test the validity of different existing inversion methods. The approach particularly focused on the analysis of spatial relationships between location of points and various objects (e.g. polygons and /or polylines) which is particularly well adapted to constrain the influence of intrusive bodies - such as a granite - and faults or ductile shear-zones on spatial location of ore deposits (point objects). The method is designed in a way to insure a-dimensionality with respect to scale. In this approach, both spatial distribution and topology of objects (polygons and polylines) can be parametrized by the user (e.g. density of objects, length, surface, orientation, clustering). Then, the distance of points with respect to a given type of objects (polygons or polylines) is given using a probability distribution. The location of points is computed assuming either independency or different grades of dependency between the two probability distributions. The results show that i)polygons surface mean value, polylines length mean value, the number of objects and their clustering are critical and ii) the validity of the different tested inversion methods strongly depends on the relative importance and on the dependency between the parameters used. In addition, this combined approach of direct and inverse modeling offers an opportunity to test the robustness of the inferred distribution point laws with respect to the quality of the input data set.
Upgrades to the REA method for producing probabilistic climate change projections
NASA Astrophysics Data System (ADS)
Xu, Ying; Gao, Xuejie; Giorgi, Filippo
2010-05-01
We present an augmented version of the Reliability Ensemble Averaging (REA) method designed to generate probabilistic climate change information from ensembles of climate model simulations. Compared to the original version, the augmented one includes consideration of multiple variables and statistics in the calculation of the performance-based weights. In addition, the model convergence criterion previously employed is removed. The method is applied to the calculation of changes in mean and variability for temperature and precipitation over different sub-regions of East Asia based on the recently completed CMIP3 multi-model ensemble. Comparison of the new and old REA methods, along with the simple averaging procedure, and the use of different combinations of performance metrics shows that at fine sub-regional scales the choice of weighting is relevant. This is mostly because the models show a substantial spread in performance for the simulation of precipitation statistics, a result that supports the use of model weighting as a useful option to account for wide ranges of quality of models. The REA method, and in particular the upgraded one, provides a simple and flexible framework for assessing the uncertainty related to the aggregation of results from ensembles of models in order to produce climate change information at the regional scale. KEY WORDS: REA method, Climate change, CMIP3
Additive Genetic Variability and the Bayesian Alphabet
Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan
2009-01-01
The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397
NASA Astrophysics Data System (ADS)
Panahi, Nima S.
We studied the problem of understanding and computing the essential features and dynamics of molecular motions through the development of two theories for two different systems. First, we studied the process of the Berry Pseudorotation of PF5 and the rotations it induces in the molecule through its natural and intrinsic geometric nature by setting it in the language of fiber bundles and graph theory. With these tools, we successfully extracted the essentials of the process' loops and induced rotations. The infinite number of pseudorotation loops were broken down into a small set of essential loops called "super loops", with their intrinsic properties and link to the physical movements of the molecule extensively studied. In addition, only the three "self-edge loops" generated any induced rotations, and then only a finite number of classes of them. Second, we studied applying the statistical methods of Principal Components Analysis (PCA) and Principal Coordinate Analysis (PCO) to capture only the most important changes in Argon clusters so as to reduce computational costs and graph the potential energy surface (PES) in three dimensions respectively. Both methods proved successful, but PCA was only partially successful since one will only see advantages for PES database systems much larger than those both currently being studied and those that can be computationally studied in the next few decades to come. In addition, PCA is only needed for the very rare case of a PES database that does not already include Hessian eigenvalues.
Brorsson, C.; Hansen, N. T.; Lage, K.; Bergholdt, R.; Brunak, S.; Pociot, F.
2009-01-01
Aim To develop novel methods for identifying new genes that contribute to the risk of developing type 1 diabetes within the Major Histocompatibility Complex (MHC) region on chromosome 6, independently of the known linkage disequilibrium (LD) between human leucocyte antigen (HLA)-DRB1, -DQA1, -DQB1 genes. Methods We have developed a novel method that combines single nucleotide polymorphism (SNP) genotyping data with protein–protein interaction (ppi) networks to identify disease-associated network modules enriched for proteins encoded from the MHC region. Approximately 2500 SNPs located in the 4 Mb MHC region were analysed in 1000 affected offspring trios generated by the Type 1 Diabetes Genetics Consortium (T1DGC). The most associated SNP in each gene was chosen and genes were mapped to ppi networks for identification of interaction partners. The association testing and resulting interacting protein modules were statistically evaluated using permutation. Results A total of 151 genes could be mapped to nodes within the protein interaction network and their interaction partners were identified. Five protein interaction modules reached statistical significance using this approach. The identified proteins are well known in the pathogenesis of T1D, but the modules also contain additional candidates that have been implicated in β-cell development and diabetic complications. Conclusions The extensive LD within the MHC region makes it important to develop new methods for analysing genotyping data for identification of additional risk genes for T1D. Combining genetic data with knowledge about functional pathways provides new insight into mechanisms underlying T1D. PMID:19143816
Vejai Vekaash, Chitra Janardhanan; Kumar Reddy, Tripuravaram Vinay; Venkatesh, Kondas Vijay
2017-01-01
Aim: This study aims to evaluate the color change in human enamel bleached with three different concentrations of hydrogen peroxide, containing pineapple extract as an additive in two different timings, using reflectance spectrophotometer. Background: The study aimed to investigate the bleaching efficacy on natural teeth using natural enzymes. Materials and Methods: Baseline color values of 10 randomly selected artificially stained incisors were obtained. The specimens were divided into three groups of 20 teeth each: Group 1 – 30% hydrogen peroxide, Group II – 20% hydrogen peroxide, and Group III – 10% hydrogen peroxide. One half of the tooth was bleached with hydrogen peroxide, and other was bleached with hydrogen peroxide and pineapple extract for 20 min (Subgroup A) and 10 min (Subgroup B). Statistical Analysis: The results were statistically analyzed using student's t-test. Results: The mean ΔE values of Group IA (31.62 ± 0.9), Group IIA (29.85 ± 1.2), and Group IIIA (28.65 ± 1.2) showed statistically significant higher values when compared to the mean Δ E values of Group 1A (25.02 ± 1.2), Group IIA (22.86 ± 1.1), and Group IIIA (16.56 ± 1.1). Identical results were obtained in Subgroup B. Conclusion: The addition of pineapple extract to hydrogen peroxide resulted in effective bleaching. PMID:29386782
An appraisal of statistical procedures used in derivation of reference intervals.
Ichihara, Kiyoshi; Boyd, James C
2010-11-01
When conducting studies to derive reference intervals (RIs), various statistical procedures are commonly applied at each step, from the planning stages to final computation of RIs. Determination of the necessary sample size is an important consideration, and evaluation of at least 400 individuals in each subgroup has been recommended to establish reliable common RIs in multicenter studies. Multiple regression analysis allows identification of the most important factors contributing to variation in test results, while accounting for possible confounding relationships among these factors. Of the various approaches proposed for judging the necessity of partitioning reference values, nested analysis of variance (ANOVA) is the likely method of choice owing to its ability to handle multiple groups and being able to adjust for multiple factors. Box-Cox power transformation often has been used to transform data to a Gaussian distribution for parametric computation of RIs. However, this transformation occasionally fails. Therefore, the non-parametric method based on determination of the 2.5 and 97.5 percentiles following sorting of the data, has been recommended for general use. The performance of the Box-Cox transformation can be improved by introducing an additional parameter representing the origin of transformation. In simulations, the confidence intervals (CIs) of reference limits (RLs) calculated by the parametric method were narrower than those calculated by the non-parametric approach. However, the margin of difference was rather small owing to additional variability in parametrically-determined RLs introduced by estimation of parameters for the Box-Cox transformation. The parametric calculation method may have an advantage over the non-parametric method in allowing identification and exclusion of extreme values during RI computation.
Moon, Andres; Smith, Geoffrey H; Kong, Jun; Rogers, Thomas E; Ellis, Carla L; Farris, Alton B Brad
2018-02-01
Renal allograft rejection diagnosis depends on assessment of parameters such as interstitial inflammation; however, studies have shown interobserver variability regarding interstitial inflammation assessment. Since automated image analysis quantitation can be reproducible, we devised customized analysis methods for CD3+ T-cell staining density as a measure of rejection severity and compared them with established commercial methods along with visual assessment. Renal biopsy CD3 immunohistochemistry slides (n = 45), including renal allografts with various degrees of acute cellular rejection (ACR) were scanned for whole slide images (WSIs). Inflammation was quantitated in the WSIs using pathologist visual assessment, commercial algorithms (Aperio nuclear algorithm for CD3+ cells/mm 2 and Aperio positive pixel count algorithm), and customized open source algorithms developed in ImageJ with thresholding/positive pixel counting (custom CD3+%) and identification of pixels fulfilling "maxima" criteria for CD3 expression (custom CD3+ cells/mm 2 ). Based on visual inspections of "markup" images, CD3 quantitation algorithms produced adequate accuracy. Additionally, CD3 quantitation algorithms correlated between each other and also with visual assessment in a statistically significant manner (r = 0.44 to 0.94, p = 0.003 to < 0.0001). Methods for assessing inflammation suggested a progression through the tubulointerstitial ACR grades, with statistically different results in borderline versus other ACR types, in all but the custom methods. Assessment of CD3-stained slides using various open source image analysis algorithms presents salient correlations with established methods of CD3 quantitation. These analysis techniques are promising and highly customizable, providing a form of on-slide "flow cytometry" that can facilitate additional diagnostic accuracy in tissue-based assessments.
NASA Astrophysics Data System (ADS)
Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.
2015-12-01
Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.
Ben Chaabane, Salim; Fnaiech, Farhat
2014-01-23
Color image segmentation has been so far applied in many areas; hence, recently many different techniques have been developed and proposed. In the medical imaging area, the image segmentation may be helpful to provide assistance to doctor in order to follow-up the disease of a certain patient from the breast cancer processed images. The main objective of this work is to rebuild and also to enhance each cell from the three component images provided by an input image. Indeed, from an initial segmentation obtained using the statistical features and histogram threshold techniques, the resulting segmentation may represent accurately the non complete and pasted cells and enhance them. This allows real help to doctors, and consequently, these cells become clear and easy to be counted. A novel method for color edges extraction based on statistical features and automatic threshold is presented. The traditional edge detector, based on the first and the second order neighborhood, describing the relationship between the current pixel and its neighbors, is extended to the statistical domain. Hence, color edges in an image are obtained by combining the statistical features and the automatic threshold techniques. Finally, on the obtained color edges with specific primitive color, a combination rule is used to integrate the edge results over the three color components. Breast cancer cell images were used to evaluate the performance of the proposed method both quantitatively and qualitatively. Hence, a visual and a numerical assessment based on the probability of correct classification (PC), the false classification (Pf), and the classification accuracy (Sens(%)) are presented and compared with existing techniques. The proposed method shows its superiority in the detection of points which really belong to the cells, and also the facility of counting the number of the processed cells. Computer simulations highlight that the proposed method substantially enhances the segmented image with smaller error rates better than other existing algorithms under the same settings (patterns and parameters). Moreover, it provides high classification accuracy, reaching the rate of 97.94%. Additionally, the segmentation method may be extended to other medical imaging types having similar properties.
NASA Technical Reports Server (NTRS)
Miles, Jeffrey Hilton
2010-01-01
Combustion noise from turbofan engines has become important, as the noise from sources like the fan and jet are reduced. An aligned and un-aligned coherence technique has been developed to determine a threshold level for the coherence and thereby help to separate the coherent combustion noise source from other noise sources measured with far-field microphones. This method is compared with a statistics based coherence threshold estimation method. In addition, the un-aligned coherence procedure at the same time also reveals periodicities, spectral lines, and undamped sinusoids hidden by broadband turbofan engine noise. In calculating the coherence threshold using a statistical method, one may use either the number of independent records or a larger number corresponding to the number of overlapped records used to create the average. Using data from a turbofan engine and a simulation this paper shows that applying the Fisher z-transform to the un-aligned coherence can aid in making the proper selection of samples and produce a reasonable statistics based coherence threshold. Examples are presented showing that the underlying tonal and coherent broad band structure which is buried under random broadband noise and jet noise can be determined. The method also shows the possible presence of indirect combustion noise. Copyright 2011 Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America.
Paixão, Paulo; Gouveia, Luís F; Silva, Nuno; Morais, José A G
2017-03-01
A simulation study is presented, evaluating the performance of the f 2 , the model-independent multivariate statistical distance and the f 2 bootstrap methods in the ability to conclude similarity between two dissolution profiles. Different dissolution profiles, based on the Noyes-Whitney equation and ranging from theoretical f 2 values between 100 and 40, were simulated. Variability was introduced in the dissolution model parameters in an increasing order, ranging from a situation complying with the European guidelines requirements for the use of the f 2 metric to several situations where the f 2 metric could not be used anymore. Results have shown that the f 2 is an acceptable metric when used according to the regulatory requirements, but loses its applicability when variability increases. The multivariate statistical distance presented contradictory results in several of the simulation scenarios, which makes it an unreliable metric for dissolution profile comparisons. The bootstrap f 2 , although conservative in its conclusions is an alternative suitable method. Overall, as variability increases, all of the discussed methods reveal problems that can only be solved by increasing the number of dosage form units used in the comparison, which is usually not practical or feasible. Additionally, experimental corrective measures may be undertaken in order to reduce the overall variability, particularly when it is shown that it is mainly due to the dissolution assessment instead of being intrinsic to the dosage form. Copyright © 2016. Published by Elsevier B.V.
Hybrid perturbation methods based on statistical time series models
NASA Astrophysics Data System (ADS)
San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario
2016-04-01
In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.
Statistical methods and neural network approaches for classification of data from multiple sources
NASA Technical Reports Server (NTRS)
Benediktsson, Jon Atli; Swain, Philip H.
1990-01-01
Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Struckmeyer, R.
This report presents the results of the NRC Direct Radiation Monitoring Network for the fourth quarter of 1995. It provides the ambient radiation levels measured in the vicinity of 75 sites throughout the United States. In addition, it describes the equipment used, monitoring station selection criteria, characterization of the dosimeter response, calibration procedures, statistical methods, intercomparison, and quality assurance program.
Topics in Statistical Calibration
2014-03-27
on a parametric bootstrap where, instead of sampling directly from the residuals , samples are drawn from a normal distribution. This procedure will...addition to centering them (Davison and Hinkley, 1997). When there are outliers in the residuals , the bootstrap distribution of x̂0 can become skewed or...based and inversion methods using the linear mixed-effects model. Then, a simple parametric bootstrap algorithm is proposed that can be used to either
Use of Unlabeled Samples for Mitigating the Hughes Phenomenon
NASA Technical Reports Server (NTRS)
Landgrebe, David A.; Shahshahani, Behzad M.
1993-01-01
The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.
NASA Astrophysics Data System (ADS)
Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. P.
2013-09-01
We study uncertainty quantification in remote sensing of aerosols in the atmosphere with top of the atmosphere reflectance measurements from the nadir-viewing Ozone Monitoring Instrument (OMI). Focus is on the uncertainty in aerosol model selection of pre-calculated aerosol models and on the statistical modelling of the model inadequacies. The aim is to apply statistical methodologies that improve the uncertainty estimates of the aerosol optical thickness (AOT) retrieval by propagating model selection and model error related uncertainties more realistically. We utilise Bayesian model selection and model averaging methods for the model selection problem and use Gaussian processes to model the smooth systematic discrepancies from the modelled to observed reflectance. The systematic model error is learned from an ensemble of operational retrievals. The operational OMI multi-wavelength aerosol retrieval algorithm OMAERO is used for cloud free, over land pixels of the OMI instrument with the additional Bayesian model selection and model discrepancy techniques. The method is demonstrated with four examples with different aerosol properties: weakly absorbing aerosols, forest fires over Greece and Russia, and Sahara dessert dust. The presented statistical methodology is general; it is not restricted to this particular satellite retrieval application.
Feature selection from a facial image for distinction of sasang constitution.
Koo, Imhoi; Kim, Jong Yeol; Kim, Myoung Geun; Kim, Keun Ho
2009-09-01
Recently, oriental medicine has received attention for providing personalized medicine through consideration of the unique nature and constitution of individual patients. With the eventual goal of globalization, the current trend in oriental medicine research is the standardization by adopting western scientific methods, which could represent a scientific revolution. The purpose of this study is to establish methods for finding statistically significant features in a facial image with respect to distinguishing constitution and to show the meaning of those features. From facial photo images, facial elements are analyzed in terms of the distance, angle and the distance ratios, for which there are 1225, 61 250 and 749 700 features, respectively. Due to the very large number of facial features, it is quite difficult to determine truly meaningful features. We suggest a process for the efficient analysis of facial features including the removal of outliers, control for missing data to guarantee data confidence and calculation of statistical significance by applying ANOVA. We show the statistical properties of selected features according to different constitutions using the nine distances, 10 angles and 10 rates of distance features that are finally established. Additionally, the Sasang constitutional meaning of the selected features is shown here.
Quantitative trait nucleotide analysis using Bayesian model selection.
Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D
2005-10-01
Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.
Kanda, Junya
2016-01-01
The Transplant Registry Unified Management Program (TRUMP) made it possible for members of the Japan Society for Hematopoietic Cell Transplantation (JSHCT) to analyze large sets of national registry data on autologous and allogeneic hematopoietic stem cell transplantation. However, as the processes used to collect transplantation information are complex and differed over time, the background of these processes should be understood when using TRUMP data. Previously, information on the HLA locus of patients and donors had been collected using a questionnaire-based free-description method, resulting in some input errors. To correct minor but significant errors and provide accurate HLA matching data, the use of a Stata or EZR/R script offered by the JSHCT is strongly recommended when analyzing HLA data in the TRUMP dataset. The HLA mismatch direction, mismatch counting method, and different impacts of HLA mismatches by stem cell source are other important factors in the analysis of HLA data. Additionally, researchers should understand the statistical analyses specific for hematopoietic stem cell transplantation, such as competing risk, landmark analysis, and time-dependent analysis, to correctly analyze transplant data. The data center of the JSHCT can be contacted if statistical assistance is required.
Using radar imagery for crop discrimination: a statistical and conditional probability study
Haralick, R.M.; Caspall, F.; Simonett, D.S.
1970-01-01
A number of the constraints with which remote sensing must contend in crop studies are outlined. They include sensor, identification accuracy, and congruencing constraints; the nature of the answers demanded of the sensor system; and the complex temporal variances of crops in large areas. Attention is then focused on several methods which may be used in the statistical analysis of multidimensional remote sensing data.Crop discrimination for radar K-band imagery is investigated by three methods. The first one uses a Bayes decision rule, the second a nearest-neighbor spatial conditional probability approach, and the third the standard statistical techniques of cluster analysis and principal axes representation.Results indicate that crop type and percent of cover significantly affect the strength of the radar return signal. Sugar beets, corn, and very bare ground are easily distinguishable, sorghum, alfalfa, and young wheat are harder to distinguish. Distinguishability will be improved if the imagery is examined in time sequence so that changes between times of planning, maturation, and harvest provide additional discriminant tools. A comparison between radar and photography indicates that radar performed surprisingly well in crop discrimination in western Kansas and warrants further study.
Feature Selection from a Facial Image for Distinction of Sasang Constitution
Koo, Imhoi; Kim, Jong Yeol; Kim, Myoung Geun
2009-01-01
Recently, oriental medicine has received attention for providing personalized medicine through consideration of the unique nature and constitution of individual patients. With the eventual goal of globalization, the current trend in oriental medicine research is the standardization by adopting western scientific methods, which could represent a scientific revolution. The purpose of this study is to establish methods for finding statistically significant features in a facial image with respect to distinguishing constitution and to show the meaning of those features. From facial photo images, facial elements are analyzed in terms of the distance, angle and the distance ratios, for which there are 1225, 61 250 and 749 700 features, respectively. Due to the very large number of facial features, it is quite difficult to determine truly meaningful features. We suggest a process for the efficient analysis of facial features including the removal of outliers, control for missing data to guarantee data confidence and calculation of statistical significance by applying ANOVA. We show the statistical properties of selected features according to different constitutions using the nine distances, 10 angles and 10 rates of distance features that are finally established. Additionally, the Sasang constitutional meaning of the selected features is shown here. PMID:19745013
Nasirudin, Radin A.; Mei, Kai; Panchev, Petar; Fehringer, Andreas; Pfeiffer, Franz; Rummeny, Ernst J.; Fiebich, Martin; Noël, Peter B.
2015-01-01
Purpose The exciting prospect of Spectral CT (SCT) using photon-counting detectors (PCD) will lead to new techniques in computed tomography (CT) that take advantage of the additional spectral information provided. We introduce a method to reduce metal artifact in X-ray tomography by incorporating knowledge obtained from SCT into a statistical iterative reconstruction scheme. We call our method Spectral-driven Iterative Reconstruction (SPIR). Method The proposed algorithm consists of two main components: material decomposition and penalized maximum likelihood iterative reconstruction. In this study, the spectral data acquisitions with an energy-resolving PCD were simulated using a Monte-Carlo simulator based on EGSnrc C++ class library. A jaw phantom with a dental implant made of gold was used as an object in this study. A total of three dental implant shapes were simulated separately to test the influence of prior knowledge on the overall performance of the algorithm. The generated projection data was first decomposed into three basis functions: photoelectric absorption, Compton scattering and attenuation of gold. A pseudo-monochromatic sinogram was calculated and used as input in the reconstruction, while the spatial information of the gold implant was used as a prior. The results from the algorithm were assessed and benchmarked with state-of-the-art reconstruction methods. Results Decomposition results illustrate that gold implant of any shape can be distinguished from other components of the phantom. Additionally, the result from the penalized maximum likelihood iterative reconstruction shows that artifacts are significantly reduced in SPIR reconstructed slices in comparison to other known techniques, while at the same time details around the implant are preserved. Quantitatively, the SPIR algorithm best reflects the true attenuation value in comparison to other algorithms. Conclusion It is demonstrated that the combination of the additional information from Spectral CT and statistical reconstruction can significantly improve image quality, especially streaking artifacts caused by the presence of materials with high atomic numbers. PMID:25955019
Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José
2013-11-01
To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.
Luczak, Brian B; James, Benjamin T; Girgis, Hani Z
2017-12-06
Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.
Visual preference and ecological assessments for designed alternative brownfield rehabilitations.
Lafortezza, Raffaele; Corry, Robert C; Sanesi, Giovanni; Brown, Robert D
2008-11-01
This paper describes an integrative method for quantifying, analyzing, and comparing the effects of alternative rehabilitation approaches with visual preference. The method was applied to a portion of a major industrial area located in southern Italy. Four alternative approaches to rehabilitation (alternative designs) were developed and analyzed. The scenarios consisted of the cleanup of the brownfields plus: (1) the addition of ground cover species; (2) the addition of ground cover species and a few trees randomly distributed; (3) the addition of ground cover species and a few trees in small groups; and (4) the addition of ground cover species and several trees in large groups. The approaches were analyzed and compared to the baseline condition through the use of cost-surface modeling (CSM) and visual preference assessment (VPA). Statistical results showed that alternatives that were more ecologically functional for forest bird species dispersal were also more visually preferable. Some differences were identified based on user groups and location of residence. The results of the study are used to identify implications for enhancing both ecological attributes and visual preferences of rehabilitating landscapes through planning and design.
Statistical alignment: computational properties, homology testing and goodness-of-fit.
Hein, J; Wiuf, C; Knudsen, B; Møller, M B; Wibling, G
2000-09-08
The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.
Wang, Dan; Singhasemanon, Nan; Goh, Kean S
2016-11-15
Pesticides are routinely monitored in surface waters and resultant data are analyzed to assess whether their uses will damage aquatic eco-systems. However, the utility of the monitoring data is limited because of the insufficiency in the temporal and spatial sampling coverage and the inability to detect and quantify trace concentrations. This study developed a novel assessment procedure that addresses those limitations by combining 1) statistical methods capable of extracting information from concentrations below changing detection limits, 2) statistical resampling techniques that account for uncertainties rooted in the non-detects and insufficient/irregular sampling coverage, and 3) multiple lines of evidence that improve confidence in the final conclusion. This procedure was demonstrated by an assessment on chlorpyrifos monitoring data in surface waters of California's Central Valley (2005-2013). We detected a significant downward trend in the concentrations, which cannot be observed by commonly-used statistical approaches. We assessed that the aquatic risk was low using a probabilistic method that works with non-detects and has the ability to differentiate indicator groups with varying sensitivity. In addition, we showed that the frequency of exceedance over ambient aquatic life water quality criteria was affected by pesticide use, precipitation and irrigation demand in certain periods anteceding the water sampling events. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Hoell, Simon; Omenzetter, Piotr
2017-04-01
The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.
GIS and statistical analysis for landslide susceptibility mapping in the Daunia area, Italy
NASA Astrophysics Data System (ADS)
Mancini, F.; Ceppi, C.; Ritrovato, G.
2010-09-01
This study focuses on landslide susceptibility mapping in the Daunia area (Apulian Apennines, Italy) and achieves this by using a multivariate statistical method and data processing in a Geographical Information System (GIS). The Logistic Regression (hereafter LR) method was chosen to produce a susceptibility map over an area of 130 000 ha where small settlements are historically threatened by landslide phenomena. By means of LR analysis, the tendency to landslide occurrences was, therefore, assessed by relating a landslide inventory (dependent variable) to a series of causal factors (independent variables) which were managed in the GIS, while the statistical analyses were performed by means of the SPSS (Statistical Package for the Social Sciences) software. The LR analysis produced a reliable susceptibility map of the investigated area and the probability level of landslide occurrence was ranked in four classes. The overall performance achieved by the LR analysis was assessed by local comparison between the expected susceptibility and an independent dataset extrapolated from the landslide inventory. Of the samples classified as susceptible to landslide occurrences, 85% correspond to areas where landslide phenomena have actually occurred. In addition, the consideration of the regression coefficients provided by the analysis demonstrated that a major role is played by the "land cover" and "lithology" causal factors in determining the occurrence and distribution of landslide phenomena in the Apulian Apennines.
Marginal regression approach for additive hazards models with clustered current status data.
Su, Pei-Fang; Chi, Yunchan
2014-01-15
Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.
RAId_aPS: MS/MS Analysis with Multiple Scoring Functions and Spectrum-Specific Statistics
Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo
2010-01-01
Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an -value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic -value reported by any method into the textbook-defined -value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific -values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign -values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign -values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page. PMID:21103371
NASA Astrophysics Data System (ADS)
Boning, Duane S.; Chung, James E.
1998-11-01
Advanced process technology will require more detailed understanding and tighter control of variation in devices and interconnects. The purpose of statistical metrology is to provide methods to measure and characterize variation, to model systematic and random components of that variation, and to understand the impact of variation on both yield and performance of advanced circuits. Of particular concern are spatial or pattern-dependencies within individual chips; such systematic variation within the chip can have a much larger impact on performance than wafer-level random variation. Statistical metrology methods will play an important role in the creation of design rules for advanced technologies. For example, a key issue in multilayer interconnect is the uniformity of interlevel dielectric (ILD) thickness within the chip. For the case of ILD thickness, we describe phases of statistical metrology development and application to understanding and modeling thickness variation arising from chemical-mechanical polishing (CMP). These phases include screening experiments including design of test structures and test masks to gather electrical or optical data, techniques for statistical decomposition and analysis of the data, and approaches to calibrating empirical and physical variation models. These models can be integrated with circuit CAD tools to evaluate different process integration or design rule strategies. One focus for the generation of interconnect design rules are guidelines for the use of "dummy fill" or "metal fill" to improve the uniformity of underlying metal density and thus improve the uniformity of oxide thickness within the die. Trade-offs that can be evaluated via statistical metrology include the improvements to uniformity possible versus the effect of increased capacitance due to additional metal.
Detecting signals of drug–drug interactions in a spontaneous reports database
Thakrar, Bharat T; Grundschober, Sabine Borel; Doessegger, Lucette
2007-01-01
Aims The spontaneous reports database is widely used for detecting signals of ADRs. We have extended the methodology to include the detection of signals of ADRs that are associated with drug–drug interactions (DDI). In particular, we have investigated two different statistical assumptions for detecting signals of DDI. Methods Using the FDA's spontaneous reports database, we investigated two models, a multiplicative and an additive model, to detect signals of DDI. We applied the models to four known DDIs (methotrexate-diclofenac and bone marrow depression, simvastatin-ciclosporin and myopathy, ketoconazole-terfenadine and torsades de pointes, and cisapride-erythromycin and torsades de pointes) and to four drug-event combinations where there is currently no evidence of a DDI (fexofenadine-ketoconazole and torsades de pointes, methotrexade-rofecoxib and bone marrow depression, fluvastatin-ciclosporin and myopathy, and cisapride-azithromycine and torsade de pointes) and estimated the measure of interaction on the two scales. Results The additive model correctly identified all four known DDIs by giving a statistically significant (P< 0.05) positive measure of interaction. The multiplicative model identified the first two of the known DDIs as having a statistically significant or borderline significant (P< 0.1) positive measure of interaction term, gave a nonsignificant positive trend for the third interaction (P= 0.27), and a negative trend for the last interaction. Both models correctly identified the four known non interactions by estimating a negative measure of interaction. Conclusions The spontaneous reports database is a valuable resource for detecting signals of DDIs. In particular, the additive model is more sensitive in detecting such signals. The multiplicative model may further help qualify the strength of the signal detected by the additive model. PMID:17506784
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
Xu, Bo; Yang, Ziheng
2016-01-01
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902
Køppe, Simo; Dammeyer, Jesper
2014-09-01
The evolution of developmental psychology has been characterized by the use of different quantitative and qualitative methods and procedures. But how does the use of methods and procedures change over time? This study explores the change and development of statistical methods used in articles published in Child Development from 1930 to 2010. The methods used in every article in the first issue of every volume were categorized into four categories. Until 1980 relatively simple statistical methods were used. During the last 30 years there has been an explosive use of more advanced statistical methods employed. The absence of statistical methods or use of simple methods had been eliminated.
A New Statistics-Based Online Baseline Restorer for a High Count-Rate Fully Digital System.
Li, Hongdi; Wang, Chao; Baghaei, Hossain; Zhang, Yuxuan; Ramirez, Rocio; Liu, Shitao; An, Shaohui; Wong, Wai-Hoi
2010-04-01
The goal of this work is to develop a novel, accurate, real-time digital baseline restorer using online statistical processing for a high count-rate digital system such as positron emission tomography (PET). In high count-rate nuclear instrumentation applications, analog signals are DC-coupled for better performance. However, the detectors, pre-amplifiers and other front-end electronics would cause a signal baseline drift in a DC-coupling system, which will degrade the performance of energy resolution and positioning accuracy. Event pileups normally exist in a high-count rate system and the baseline drift will create errors in the event pileup-correction. Hence, a baseline restorer (BLR) is required in a high count-rate system to remove the DC drift ahead of the pileup correction. Many methods have been reported for BLR from classic analog methods to digital filter solutions. However a single channel BLR with analog method can only work under 500 kcps count-rate, and normally an analog front-end application-specific integrated circuits (ASIC) is required for the application involved hundreds BLR such as a PET camera. We have developed a simple statistics-based online baseline restorer (SOBLR) for a high count-rate fully digital system. In this method, we acquire additional samples, excluding the real gamma pulses, from the existing free-running ADC in the digital system, and perform online statistical processing to generate a baseline value. This baseline value will be subtracted from the digitized waveform to retrieve its original pulse with zero-baseline drift. This method can self-track the baseline without a micro-controller involved. The circuit consists of two digital counter/timers, one comparator, one register and one subtraction unit. Simulation shows a single channel works at 30 Mcps count-rate with pileup condition. 336 baseline restorer circuits have been implemented into 12 field-programmable-gate-arrays (FPGA) for our new fully digital PET system.
Zheng, Jie; Rodriguez, Santiago; Laurin, Charles; Baird, Denis; Trela-Larsen, Lea; Erzurumluoglu, Mesut A; Zheng, Yi; White, Jon; Giambartolomei, Claudia; Zabaneh, Delilah; Morris, Richard; Kumari, Meena; Casas, Juan P; Hingorani, Aroon D; Evans, David M; Gaunt, Tom R; Day, Ian N M
2017-01-01
Fine mapping is a widely used approach for identifying the causal variant(s) at disease-associated loci. Standard methods (e.g. multiple regression) require individual level genotypes. Recent fine mapping methods using summary-level data require the pairwise correlation coefficients ([Formula: see text]) of the variants. However, haplotypes rather than pairwise [Formula: see text], are the true biological representation of linkage disequilibrium (LD) among multiple loci. In this article, we present an empirical iterative method, HAPlotype Regional Association analysis Program (HAPRAP), that enables fine mapping using summary statistics and haplotype information from an individual-level reference panel. Simulations with individual-level genotypes show that the results of HAPRAP and multiple regression are highly consistent. In simulation with summary-level data, we demonstrate that HAPRAP is less sensitive to poor LD estimates. In a parametric simulation using Genetic Investigation of ANthropometric Traits height data, HAPRAP performs well with a small training sample size (N < 2000) while other methods become suboptimal. Moreover, HAPRAP's performance is not affected substantially by single nucleotide polymorphisms (SNPs) with low minor allele frequencies. We applied the method to existing quantitative trait and binary outcome meta-analyses (human height, QTc interval and gallbladder disease); all previous reported association signals were replicated and two additional variants were independently associated with human height. Due to the growing availability of summary level data, the value of HAPRAP is likely to increase markedly for future analyses (e.g. functional prediction and identification of instruments for Mendelian randomization). The HAPRAP package and documentation are available at http://apps.biocompute.org.uk/haprap/ CONTACT: : jie.zheng@bristol.ac.uk or tom.gaunt@bristol.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
Qu, Xin; Hall, Alex; DeAngelis, Anthony M.; ...
2018-01-11
Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qu, Xin; Hall, Alex; DeAngelis, Anthony M.
Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less
Tosteson, Tor D.; Morden, Nancy E.; Stukel, Therese A.; O'Malley, A. James
2014-01-01
The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival. PMID:25506259
MacKenzie, Todd A; Tosteson, Tor D; Morden, Nancy E; Stukel, Therese A; O'Malley, A James
2014-06-01
The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival.
Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis
Steele, Joe; Bastola, Dhundy
2014-01-01
Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502
Statistical Evaluation and Improvement of Methods for Combining Random and Harmonic Loads
NASA Technical Reports Server (NTRS)
Brown, A. M.; McGhee, D. S.
2003-01-01
Structures in many environments experience both random and harmonic excitation. A variety of closed-form techniques has been used in the aerospace industry to combine the loads resulting from the two sources. The resulting combined loads are then used to design for both yield/ultimate strength and high- cycle fatigue capability. This Technical Publication examines the cumulative distribution percentiles obtained using each method by integrating the joint probability density function of the sine and random components. A new Microsoft Excel spreadsheet macro that links with the software program Mathematica to calculate the combined value corresponding to any desired percentile is then presented along with a curve tit to this value. Another Excel macro that calculates the combination using Monte Carlo simulation is shown. Unlike the traditional techniques. these methods quantify the calculated load value with a consistent percentile. Using either of the presented methods can be extremely valuable in probabilistic design, which requires a statistical characterization of the loading. Additionally, since the CDF at high probability levels is very flat, the design value is extremely sensitive to the predetermined percentile; therefore, applying the new techniques can substantially lower the design loading without losing any of the identified structural reliability.
Variance reduction for Fokker–Planck based particle Monte Carlo schemes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorji, M. Hossein, E-mail: gorjih@ifd.mavt.ethz.ch; Andric, Nemanja; Jenny, Patrick
Recently, Fokker–Planck based particle Monte Carlo schemes have been proposed and evaluated for simulations of rarefied gas flows [1–3]. In this paper, the variance reduction for particle Monte Carlo simulations based on the Fokker–Planck model is considered. First, deviational based schemes were derived and reviewed, and it is shown that these deviational methods are not appropriate for practical Fokker–Planck based rarefied gas flow simulations. This is due to the fact that the deviational schemes considered in this study lead either to instabilities in the case of two-weight methods or to large statistical errors if the direct sampling method is applied.more » Motivated by this conclusion, we developed a novel scheme based on correlated stochastic processes. The main idea here is to synthesize an additional stochastic process with a known solution, which is simultaneously solved together with the main one. By correlating the two processes, the statistical errors can dramatically be reduced; especially for low Mach numbers. To assess the methods, homogeneous relaxation, planar Couette and lid-driven cavity flows were considered. For these test cases, it could be demonstrated that variance reduction based on parallel processes is very robust and effective.« less
Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics
Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven
2011-01-01
Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957
Robustness of fit indices to outliers and leverage observations in structural equation modeling.
Yuan, Ke-Hai; Zhong, Xiaoling
2013-06-01
Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
[Review of research design and statistical methods in Chinese Journal of Cardiology].
Zhang, Li-jun; Yu, Jin-ming
2009-07-01
To evaluate the research design and the use of statistical methods in Chinese Journal of Cardiology. Peer through the research design and statistical methods in all of the original papers in Chinese Journal of Cardiology from December 2007 to November 2008. The most frequently used research designs are cross-sectional design (34%), prospective design (21%) and experimental design (25%). In all of the articles, 49 (25%) use wrong statistical methods, 29 (15%) lack some sort of statistic analysis, 23 (12%) have inconsistencies in description of methods. There are significant differences between different statistical methods (P < 0.001). The correction rates of multifactor analysis were low and repeated measurement datas were not used repeated measurement analysis. Many problems exist in Chinese Journal of Cardiology. Better research design and correct use of statistical methods are still needed. More strict review by statistician and epidemiologist is also required to improve the literature qualities.
Colon-Berlingeri, Migdalisel; Burrowes, Patricia A.
2011-01-01
Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math–biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology. PMID:21885822
Saadati, Farzaneh; Ahmad Tarmizi, Rohani
2015-01-01
Because students’ ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is ‘value added’ because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students’ problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students. PMID:26132553
Evaluating the statistical methodology of randomized trials on dentin hypersensitivity management.
Matranga, Domenica; Matera, Federico; Pizzo, Giuseppe
2017-12-27
The present study aimed to evaluate the characteristics and quality of statistical methodology used in clinical studies on dentin hypersensitivity management. An electronic search was performed for data published from 2009 to 2014 by using PubMed, Ovid/MEDLINE, and Cochrane Library databases. The primary search terms were used in combination. Eligibility criteria included randomized clinical trials that evaluated the efficacy of desensitizing agents in terms of reducing dentin hypersensitivity. A total of 40 studies were considered eligible for assessment of quality statistical methodology. The four main concerns identified were i) use of nonparametric tests in the presence of large samples, coupled with lack of information about normality and equality of variances of the response; ii) lack of P-value adjustment for multiple comparisons; iii) failure to account for interactions between treatment and follow-up time; and iv) no information about the number of teeth examined per patient and the consequent lack of cluster-specific approach in data analysis. Owing to these concerns, statistical methodology was judged as inappropriate in 77.1% of the 35 studies that used parametric methods. Additional studies with appropriate statistical analysis are required to obtain appropriate assessment of the efficacy of desensitizing agents.
Colon-Berlingeri, Migdalisel; Burrowes, Patricia A
2011-01-01
Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math-biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology.
Wang, Guoli; Ebrahimi, Nader
2014-01-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345
Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader
2015-04-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
NASA Astrophysics Data System (ADS)
Lamie, Nesrine T.
2015-10-01
Four, accurate, precise, and sensitive spectrophotometric methods are developed for simultaneous determination of a binary mixture of amlodipine besylate (AM) and atenolol (AT). AM is determined at its λmax 360 nm (0D), while atenolol can be determined by four different methods. Method (A) is absorption factor (AF). Method (B) is the new ratio difference method (RD) which measures the difference in amplitudes between 210 and 226 nm. Method (C) is novel constant center spectrophotometric method (CC). Method (D) is mean centering of the ratio spectra (MCR) at 284 nm. The methods are tested by analyzing synthetic mixtures of the cited drugs and they are applied to their commercial pharmaceutical preparation. The validity of results is assessed by applying standard addition technique. The results obtained are found to agree statistically with those obtained by official methods, showing no significant difference with respect to accuracy and precision.
The spectral analysis of fuel oils using terahertz radiation and chemometric methods
NASA Astrophysics Data System (ADS)
Zhan, Honglei; Zhao, Kun; Zhao, Hui; Li, Qian; Zhu, Shouming; Xiao, Lizhi
2016-10-01
The combustion characteristics of fuel oils are closely related to both engine efficiency and pollutant emissions, and the analysis of oils and their additives is thus important. These oils and additives have been found to generate distinct responses to terahertz (THz) radiation as the result of various molecular vibrational modes. In the present work, THz spectroscopy was employed to identify a number of oils, including lubricants, gasoline and diesel, with different additives. The identities of dozens of these oils could be readily established using statistical models based on principal component analysis. The THz spectra of gasoline, diesel, sulfur and methyl methacrylate (MMA) were acquired and linear fittings were obtained. By using chemometric methods, including back propagation, artificial neural network and support vector machine techniques, typical concentrations of sulfur in gasoline (ppm-grade) could be detected, together with MMA in diesel below 0.5%. The absorption characteristics of the oil additives were also assessed using 2D correlation spectroscopy, and several hidden absorption peaks were discovered. The technique discussed herein should provide a useful new means of analyzing fuel oils with various additives and impurities in a non-destructive manner and therefore will be of benefit to the field of chemical detection and identification.
ERIC Educational Resources Information Center
Barron, Kenneth E.; Apple, Kevin J.
2014-01-01
Coursework in statistics and research methods is a core requirement in most undergraduate psychology programs. However, is there an optimal way to structure and sequence methodology courses to facilitate student learning? For example, should statistics be required before research methods, should research methods be required before statistics, or…
Development of a Research Methods and Statistics Concept Inventory
ERIC Educational Resources Information Center
Veilleux, Jennifer C.; Chapman, Kate M.
2017-01-01
Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…
Skinner, Carl G; Patel, Manish M; Thomas, Jerry D; Miller, Michael A
2011-01-01
Statistical methods are pervasive in medical research and general medical literature. Understanding general statistical concepts will enhance our ability to critically appraise the current literature and ultimately improve the delivery of patient care. This article intends to provide an overview of the common statistical methods relevant to medicine.
Lotfy, Hayam M; Saleh, Sarah S; Hassan, Nagiba Y; Salem, Hesham
2015-01-01
Novel spectrophotometric methods were applied for the determination of the minor component tetryzoline HCl (TZH) in its ternary mixture with ofloxacin (OFX) and prednisolone acetate (PA) in the ratio of (1:5:7.5), and in its binary mixture with sodium cromoglicate (SCG) in the ratio of (1:80). The novel spectrophotometric methods determined the minor component (TZH) successfully in the two selected mixtures by computing the geometrical relationship of either standard addition or subtraction. The novel spectrophotometric methods are: geometrical amplitude modulation (GAM), geometrical induced amplitude modulation (GIAM), ratio H-point standard addition method (RHPSAM) and compensated area under the curve (CAUC). The proposed methods were successfully applied for the determination of the minor component TZH below its concentration range. The methods were validated as per ICH guidelines where accuracy, repeatability, inter-day precision and robustness were found to be within the acceptable limits. The results obtained from the proposed methods were statistically compared with official ones where no significant difference was observed. No difference was observed between the obtained results when compared to the reported HPLC method, which proved that the developed methods could be alternative to HPLC techniques in quality control laboratories. Copyright © 2015 Elsevier B.V. All rights reserved.
Methodological Reporting of Randomized Trials in Five Leading Chinese Nursing Journals
Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu
2014-01-01
Background Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. Methods In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. Results In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34±0.97 (Mean ± SD). No RCT reported descriptions and changes in “trial design,” changes in “outcomes” and “implementation,” or descriptions of the similarity of interventions for “blinding.” Poor reporting was found in detailing the “settings of participants” (13.1%), “type of randomization sequence generation” (1.8%), calculation methods of “sample size” (0.4%), explanation of any interim analyses and stopping guidelines for “sample size” (0.3%), “allocation concealment mechanism” (0.3%), additional analyses in “statistical methods” (2.1%), and targeted subjects and methods of “blinding” (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of “participants,” “interventions,” and definitions of the “outcomes” and “statistical methods.” The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. Conclusions The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods. PMID:25415382
Displaying R spatial statistics on Google dynamic maps with web applications created by Rwui
2012-01-01
Background The R project includes a large variety of packages designed for spatial statistics. Google dynamic maps provide web based access to global maps and satellite imagery. We describe a method for displaying directly the spatial output from an R script on to a Google dynamic map. Methods This is achieved by creating a Java based web application which runs the R script and then displays the results on the dynamic map. In order to make this method easy to implement by those unfamiliar with programming Java based web applications, we have added the method to the options available in the R Web User Interface (Rwui) application. Rwui is an established web application for creating web applications for running R scripts. A feature of Rwui is that all the code for the web application being created is generated automatically so that someone with no knowledge of web programming can make a fully functional web application for running an R script in a matter of minutes. Results Rwui can now be used to create web applications that will display the results from an R script on a Google dynamic map. Results may be displayed as discrete markers and/or as continuous overlays. In addition, users of the web application may select regions of interest on the dynamic map with mouse clicks and the coordinates of the region of interest will automatically be made available for use by the R script. Conclusions This method of displaying R output on dynamic maps is designed to be of use in a number of areas. Firstly it allows statisticians, working in R and developing methods in spatial statistics, to easily visualise the results of applying their methods to real world data. Secondly, it allows researchers who are using R to study health geographics data, to display their results directly onto dynamic maps. Thirdly, by creating a web application for running an R script, a statistician can enable users entirely unfamiliar with R to run R coded statistical analyses of health geographics data. Fourthly, we envisage an educational role for such applications. PMID:22998945
Testing local anisotropy using the method of smoothed residuals I — methodology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Appleby, Stephen; Shafieloo, Arman, E-mail: stephen.appleby@apctp.org, E-mail: arman@apctp.org
2014-03-01
We discuss some details regarding the method of smoothed residuals, which has recently been used to search for anisotropic signals in low-redshift distance measurements (Supernovae). In this short note we focus on some details regarding the implementation of the method, particularly the issue of effectively detecting signals in data that are inhomogeneously distributed on the sky. Using simulated data, we argue that the original method proposed in Colin et al. [1] will not detect spurious signals due to incomplete sky coverage, and that introducing additional Gaussian weighting to the statistic as in [2] can hinder its ability to detect amore » signal. Issues related to the width of the Gaussian smoothing are also discussed.« less
Application of meta-analysis methods for identifying proteomic expression level differences.
Amess, Bob; Kluge, Wolfgang; Schwarz, Emanuel; Haenisch, Frieder; Alsaif, Murtada; Yolken, Robert H; Leweke, F Markus; Guest, Paul C; Bahn, Sabine
2013-07-01
We present new statistical approaches for identification of proteins with expression levels that are significantly changed when applying meta-analysis to two or more independent experiments. We showed that the Euclidean distance measure has reduced risk of false positives compared to the rank product method. Our Ψ-ranking method has advantages over the traditional fold-change approach by incorporating both the fold-change direction as well as the p-value. In addition, the second novel method, Π-ranking, considers the ratio of the fold-change and thus integrates all three parameters. We further improved the latter by introducing our third technique, Σ-ranking, which combines all three parameters in a balanced nonparametric approach. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Detection of Failure in Asynchronous Motor Using Soft Computing Method
NASA Astrophysics Data System (ADS)
Vinoth Kumar, K.; Sony, Kevin; Achenkunju John, Alan; Kuriakose, Anto; John, Ano P.
2018-04-01
This paper investigates the stator short winding failure of asynchronous motor also their effects on motor current spectrums. A fuzzy logic approach i.e., model based technique possibly will help to detect the asynchronous motor failure. Actually, fuzzy logic similar to humanoid intelligent methods besides expected linguistic empowering inferences through vague statistics. The dynamic model is technologically advanced for asynchronous motor by means of fuzzy logic classifier towards investigate the stator inter turn failure in addition open phase failure. A hardware implementation was carried out with LabVIEW for the online-monitoring of faults.
Target attribute-based false alarm rejection in small infrared target detection
NASA Astrophysics Data System (ADS)
Kim, Sungho
2012-11-01
Infrared search and track is an important research area in military applications. Although there are a lot of works on small infrared target detection methods, we cannot apply them in real field due to high false alarm rate caused by clutters. This paper presents a novel target attribute extraction and machine learning-based target discrimination method. Eight kinds of target features are extracted and analyzed statistically. Learning-based classifiers such as SVM and Adaboost are developed and compared with conventional classifiers for real infrared images. In addition, the generalization capability is also inspected for various infrared clutters.
KERNELHR: A program for estimating animal home ranges
Seaman, D.E.; Griffith, B.; Powell, R.A.
1998-01-01
Kernel methods are state of the art for estimating animal home-range area and utilization distribution (UD). The KERNELHR program was developed to provide researchers and managers a tool to implement this extremely flexible set of methods with many variants. KERNELHR runs interactively or from the command line on any personal computer (PC) running DOS. KERNELHR provides output of fixed and adaptive kernel home-range estimates, as well as density values in a format suitable for in-depth statistical and spatial analyses. An additional package of programs creates contour files for plotting in geographic information systems (GIS) and estimates core areas of ranges.
A new method for imaging nuclear threats using cosmic ray muons
NASA Astrophysics Data System (ADS)
Morris, C. L.; Bacon, Jeffrey; Borozdin, Konstantin; Miyadera, Haruo; Perry, John; Rose, Evan; Watson, Scott; White, Tim; Aberle, Derek; Green, J. Andrew; McDuff, George G.; Lukić, Zarija; Milner, Edward C.
2013-08-01
Muon tomography is a technique that uses cosmic ray muons to generate three dimensional images of volumes using information contained in the Coulomb scattering of the muons. Advantages of this technique are the ability of cosmic rays to penetrate significant overburden and the absence of any additional dose delivered to subjects under study above the natural cosmic ray flux. Disadvantages include the relatively long exposure times and poor position resolution and complex algorithms needed for reconstruction. Here we demonstrate a new method for obtaining improved position resolution and statistical precision for objects with spherical symmetry.
A new method for imaging nuclear threats using cosmic ray muons
Morris, C. L.; Bacon, Jeffrey; Borozdin, Konstantin; ...
2013-08-29
Muon tomography is a technique that uses cosmic ray muons to generate three-dimensional images of volumes using information contained in the Coulomb scattering of the muons. Advantages of this technique are the ability of cosmic rays to penetrate significant overburden and the absence of any additional dose delivered to subjects under study beyond the natural cosmic ray flux. Disadvantages include the relatively long exposure times and poor position resolution and complex algorithms needed for reconstruction. Furthermore, we demonstrate a new method for obtaining improved position resolution and statistical precision for objects with spherical symmetry.
Hubacher, David; Akora, Vitalis; Masaba, Rose; Chen, Mario; Veena, Valentine
2014-02-01
The levonorgestrel intrauterine system (LNG IUS) was developed over 30 years ago, but the product is currently too expensive for widespread use in many developing countries. In Kenya, one organization has received donated commodities for 5 years, providing an opportunity to assess impact and potential future role of the product. We reviewed service statistics on insertions of the LNG IUS, copper intrauterine device (IUD), and subdermal implant from 15 mobile outreach teams during the 2011 calendar year. To determine the impact of the LNG IUS introduction, we analyzed changes in uptake and distribution of the copper IUD and subdermal implant by comparing periods of time when the LNG IUS was available with periods when it was not available. In addition, we interviewed 27 clinicians to assess their views of the product and of its future role. When the LNG IUS was not available, intrauterine contraception accounted for 39% of long-acting method provision. The addition of the LNG IUS created a slight rise in intrauterine contraception uptake (to 44%) at the expense of the subdermal implant, but the change was only marginally significant (P = .08) and was largely attributable to the copper IUD. All interviewed providers felt that the LNG IUS would increase uptake of long-acting methods, and 70% felt that the noncontraceptive benefits of the product are important to clients. The LNG IUS was well-received among providers and family planning clients in this population in Kenya. Although important changes in service statistics were not apparent from this analysis (perhaps due to the small quantity of LNG IUS that was available), provider enthusiasm for the product was high. This finding, above all, suggests that a larger-scale introduction effort would have strong support from providers and thus increase the chances of success. Adding another proven and highly acceptable long-acting contraceptive technology to the method mix could have important reproductive health impact.
Vinay, K. B.; Revanasiddappa, H. D.; Raghu, M. S.; Abdulrahman, Sameer. A. M.; Rajendraprasad, N.
2012-01-01
Two simple, selective, and rapid spectrophotometric methods are described for the determination of mycophenolate mofetil (MPM) in pure form and in tablets. Both methods are based on charge-transfer complexation reaction of MPM with p-chloranilic acid (p-CA) or 2,3-dichloro-5,6-dicyano-1,4-benzoquinone (DDQ) in dioxane-acetonitrile medium resulting in coloured product measurable at 520 nm (p-CA) or 580 nm (DDQ). Beer's law is obeyed over the concentration ranges of 40–400 and 12–120 μg mL−1 MPM for p-CA and DDQ, respectively, with correlation coefficients (r) of 0.9995 and 0.9947. The apparent molar absorptivity values are calculated to be 1.06 × 103 and 3.87 × 103 L mol−1 cm−1, respectively, and the corresponding Sandell's sensitivities are 0.4106 and 0.1119 μg cm−1. The limits of detection (LOD) and quantification (LOQ) are also reported for both methods. The described methods were successfully applied to the determination of MPM in tablets. Statistical comparison of the results with those of the reference method showed excellent agreement. No interference was observed from the common excipients present in tablets. Both methods were validated statistically for accuracy and precision. The accuracy and reliability of the methods were further ascertained by recovery studies via standard addition procedure. PMID:22567572
Kaneta, Tomohiro; Nakatsuka, Masahiro; Nakamura, Kei; Seki, Takashi; Yamaguchi, Satoshi; Tsuboi, Masahiro; Meguro, Kenichi
2016-01-01
SPECT is an important diagnostic tool for dementia. Recently, statistical analysis of SPECT has been commonly used for dementia research. In this study, we evaluated the accuracy of visual SPECT evaluation and/or statistical analysis for the diagnosis (Dx) of Alzheimer disease (AD) and other forms of dementia in our community-based study "The Osaki-Tajiri Project." Eighty-nine consecutive outpatients with dementia were enrolled and underwent brain perfusion SPECT with 99mTc-ECD. Diagnostic accuracy of SPECT was tested using 3 methods: visual inspection (SPECT Dx), automated diagnostic tool using statistical analysis with easy Z-score imaging system (eZIS Dx), and visual inspection plus eZIS (integrated Dx). Integrated Dx showed the highest sensitivity, specificity, and accuracy, whereas eZIS was the second most accurate method. We also observed that a higher than expected rate of SPECT images indicated false-negative cases of AD. Among these, 50% showed hypofrontality and were diagnosed as frontotemporal lobar degeneration. These cases typically showed regional "hot spots" in the primary sensorimotor cortex (ie, a sensorimotor hot spot sign), which we determined were associated with AD rather than frontotemporal lobar degeneration. We concluded that the diagnostic abilities were improved by the integrated use of visual assessment and statistical analysis. In addition, the detection of a sensorimotor hot spot sign was useful to detect AD when hypofrontality is present and improved the ability to properly diagnose AD.
A new statistical distance scale for planetary nebulae
NASA Astrophysics Data System (ADS)
Ali, Alaa; Ismail, H. A.; Alsolami, Z.
2015-05-01
In the first part of the present article we discuss the consistency among different individual distance methods of Galactic planetary nebulae, while in the second part we develop a new statistical distance scale based on a calibrating sample of well determined distances. A set composed of 315 planetary nebulae with individual distances are extracted from the literature. Inspecting the data set indicates that the accuracy of distances is varying among different individual methods and also among different sources where the same individual method was applied. Therefore, we derive a reliable weighted mean distance for each object by considering the influence of the distance error and the weight of each individual method. The results reveal that the discussed individual methods are consistent with each other, except the gravity method that produces higher distances compared to other individual methods. From the initial data set, we construct a standard calibrating sample consists of 82 objects. This sample is restricted only to the objects with distances determined from at least two different individual methods, except few objects with trusted distances determined from the trigonometric, spectroscopic, and cluster membership methods. In addition to the well determined distances for this sample, it shows a lot of advantages over that used in the prior distance scales. This sample is used to recalibrate the mass-radius and radio surface brightness temperature-radius relationships. An average error of ˜30 % is estimated for the new distance scale. The newly distance scale is compared with the most widely used statistical scales in literature, where the results show that it is roughly similar to the majority of them within ˜±20 % difference. Furthermore, the new scale yields a weighted mean distance to the Galactic center of 7.6±1.35 kpc, which in good agreement with the very recent measure of Malkin 2013.
Sb2Te3 and Its Superlattices: Optimization by Statistical Design.
Behera, Jitendra K; Zhou, Xilin; Ranjan, Alok; Simpson, Robert E
2018-05-02
The objective of this work is to demonstrate the usefulness of fractional factorial design for optimizing the crystal quality of chalcogenide van der Waals (vdW) crystals. We statistically analyze the growth parameters of highly c axis oriented Sb 2 Te 3 crystals and Sb 2 Te 3 -GeTe phase change vdW heterostructured superlattices. The statistical significance of the growth parameters of temperature, pressure, power, buffer materials, and buffer layer thickness was found by fractional factorial design and response surface analysis. Temperature, pressure, power, and their second-order interactions are the major factors that significantly influence the quality of the crystals. Additionally, using tungsten rather than molybdenum as a buffer layer significantly enhances the crystal quality. Fractional factorial design minimizes the number of experiments that are necessary to find the optimal growth conditions, resulting in an order of magnitude improvement in the crystal quality. We highlight that statistical design of experiment methods, which is more commonly used in product design, should be considered more broadly by those designing and optimizing materials.
The discounting model selector: Statistical software for delay discounting applications.
Gilroy, Shawn P; Franck, Christopher T; Hantula, Donald A
2017-05-01
Original, open-source computer software was developed and validated against established delay discounting methods in the literature. The software executed approximate Bayesian model selection methods from user-supplied temporal discounting data and computed the effective delay 50 (ED50) from the best performing model. Software was custom-designed to enable behavior analysts to conveniently apply recent statistical methods to temporal discounting data with the aid of a graphical user interface (GUI). The results of independent validation of the approximate Bayesian model selection methods indicated that the program provided results identical to that of the original source paper and its methods. Monte Carlo simulation (n = 50,000) confirmed that true model was selected most often in each setting. Simulation code and data for this study were posted to an online repository for use by other researchers. The model selection approach was applied to three existing delay discounting data sets from the literature in addition to the data from the source paper. Comparisons of model selected ED50 were consistent with traditional indices of discounting. Conceptual issues related to the development and use of computer software by behavior analysts and the opportunities afforded by free and open-sourced software are discussed and a review of possible expansions of this software are provided. © 2017 Society for the Experimental Analysis of Behavior.
NASA Astrophysics Data System (ADS)
Pardimin, H.; Arcana, N.
2018-01-01
Many types of research in the field of mathematics education apply the Quasi-Experimental method and statistical analysis use t-test. Quasi-experiment has a weakness that is difficult to fulfil “the law of a single independent variable”. T-test also has a weakness that is a generalization of the conclusions obtained is less powerful. This research aimed to find ways to reduce the weaknesses of the Quasi-experimental method and improved the generalization of the research results. The method applied in the research was a non-interactive qualitative method, and the type was concept analysis. Concepts analysed are the concept of statistics, research methods of education, and research reports. The result represented a way to overcome the weaknesses of quasi-Experiments and T-test. In addition, the way was to apply a combination of Factorial Design and Balanced Design, which the authors refer to as Factorial-Balanced Design. The advantages of this design are: (1) almost fulfilling “the low of single independent variable” so no need to test the similarity of the academic ability, (2) the sample size of the experimental group and the control group became larger and equal; so it becomes robust to deal with violations of the assumptions of the ANOVA test.
Genetic algorithm for the optimization of features and neural networks in ECG signals classification
NASA Astrophysics Data System (ADS)
Li, Hongqiang; Yuan, Danyang; Ma, Xiangdong; Cui, Dianyin; Cao, Lu
2017-01-01
Feature extraction and classification of electrocardiogram (ECG) signals are necessary for the automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is utilized to extract the effective features of ECG signals. The statistical features of the wavelet packet coefficients are calculated as the feature sets. GA is employed to decrease the dimensions of the feature sets and to optimize the weights and biases of the back propagation neural network (BPNN). Thereafter, the optimized BPNN classifier is applied to classify six types of ECG signals. In addition, an experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the effectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database achieved a dimension reduction of nearly 50% and produced good classification results with an accuracy of 97.78%. The experimental results based on the established acquisition platform indicated that the GA-BPNN method achieved a high classification accuracy of 99.33% and could be efficiently applied in the automatic identification of cardiac arrhythmias.
NASA Astrophysics Data System (ADS)
Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen
2013-03-01
A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF -XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF -XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.
Level statistics of words: Finding keywords in literary texts and symbolic sequences
NASA Astrophysics Data System (ADS)
Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.
2009-03-01
Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.
Comparative analysis on the selection of number of clusters in community detection
NASA Astrophysics Data System (ADS)
Kawamoto, Tatsuro; Kabashima, Yoshiyuki
2018-02-01
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen
2013-03-01
A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF-XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF-XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.
Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.
2017-01-01
Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190
Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L
2017-01-01
Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.
Consolidating DoD Housing and Allowance Data Collection
1991-01-01
data . In addition, the military staff chains of command, unit chains of command, DMDC, and the Navy’s Facilities Support Office (FACSO) become...non-pay section of the form if the Finance Office abandons it. However, the current methods of collecting data are equally risky, and statistical ...minimum standards are rescored as acceptable. The survey data sheets are then mailed to the Navy’s Facility Support Office (FACSO) at Port Hueneme, CA
2010-10-01
bodies becomes greater as surface as- perities wear down (Hutchings, 1992). We characterize friction damage by a change in the friction coefficient...points are such a set, and satisfy an additional constraint in which the skew ( third moment) is minimized, which reduces the average error for a...On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10, 197–208. Hutchings, I. M. (1992). Tribology : friction
Preferred Primary Healthcare Provider Choice Among Insured Persons in Ashanti Region, Ghana
Boachie, Micheal Kofi
2016-01-01
Background: In early 2012, National Health Insurance Scheme (NHIS) members in Ashanti Region were allowed to choose their own primary healthcare providers. This paper investigates the factors that enrolees in the Ashanti Region considered in choosing preferred primary healthcare providers (PPPs) and direction of association of such factors with the choice of PPP. Methods: Using a cross-sectional study design, the study sampled 600 NHIS enrolees in Kumasi Metro area and Kwabre East district. The sampling methods were a combination of simple random and systematic sampling techniques at different stages. Descriptive statistics were used to analyse demographic information and the criteria for selecting PPP. Multinomial logistic regression technique was used to ascertain the direction of association of the factors and the choice of PPP using mission PPPs as the base outcome. Results: Out of the 600 questionnaires administered, 496 were retained for further analysis. The results show that availability of essential drugs (53.63%) and doctors (39.92%), distance or proximity (49.60%), provider reputation (39.52%), waiting time (39.92), additional charges (37.10%), and recommendations (48.79%) were the main criteria adopted by enrolees in selecting PPPs. In the regression, income (-0.0027), availability of doctors (-1.82), additional charges (-2.14) and reputation (-2.09) were statistically significant at 1% in influencing the choice of government PPPs. On the part of private PPPs, availability of drugs (2.59), waiting time (1.45), residence (-2.62), gender (-2.89), and reputation (-2.69) were statistically significant at 1% level. Presence of additional charges (-1.29) was statistically significant at 5% level. Conclusion: Enrolees select their PPPs based on such factors as availability of doctors and essential drugs, reputation, waiting time, income, and their residence. Based on these findings, there is the need for healthcare providers to improve on their quality levels by ensuring constant availability of essential drugs, doctors, and shorter waiting time. However, individual enrolees may value each criterion differently. Thus, not all enrolees may be motivated by same concerns. This requires providers to be circumspect regarding the factors that may attract enrolees. The National Health Insurance Authority (NHIA) should also ensure timely release of funds to help providers procure the necessary medical supplies to ensure quality service PMID:26927586
Quality of reporting statistics in two Indian pharmacology journals
Jaykaran; Yadav, Preeti
2011-01-01
Objective: To evaluate the reporting of the statistical methods in articles published in two Indian pharmacology journals. Materials and Methods: All original articles published since 2002 were downloaded from the journals’ (Indian Journal of Pharmacology (IJP) and Indian Journal of Physiology and Pharmacology (IJPP)) website. These articles were evaluated on the basis of appropriateness of descriptive statistics and inferential statistics. Descriptive statistics was evaluated on the basis of reporting of method of description and central tendencies. Inferential statistics was evaluated on the basis of fulfilling of assumption of statistical methods and appropriateness of statistical tests. Values are described as frequencies, percentage, and 95% confidence interval (CI) around the percentages. Results: Inappropriate descriptive statistics was observed in 150 (78.1%, 95% CI 71.7–83.3%) articles. Most common reason for this inappropriate descriptive statistics was use of mean ± SEM at the place of “mean (SD)” or “mean ± SD.” Most common statistical method used was one-way ANOVA (58.4%). Information regarding checking of assumption of statistical test was mentioned in only two articles. Inappropriate statistical test was observed in 61 (31.7%, 95% CI 25.6–38.6%) articles. Most common reason for inappropriate statistical test was the use of two group test for three or more groups. Conclusion: Articles published in two Indian pharmacology journals are not devoid of statistical errors. PMID:21772766
Statistical inference for the additive hazards model under outcome-dependent sampling.
Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo
2015-09-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Statistical inference for the additive hazards model under outcome-dependent sampling
Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo
2015-01-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363
Using GIS-based methods and lidar data to estimate rooftop solar technical potential in US cities
NASA Astrophysics Data System (ADS)
Margolis, Robert; Gagnon, Pieter; Melius, Jennifer; Phillips, Caleb; Elmore, Ryan
2017-07-01
We estimate the technical potential of rooftop solar photovoltaics (PV) for select US cities by combining light detection and ranging (lidar) data, a validated analytical method for determining rooftop PV suitability employing geographic information systems, and modeling of PV electricity generation. We find that rooftop PV’s ability to meet estimated city electricity consumption varies widely—from meeting 16% of annual consumption (in Washington, DC) to meeting 88% (in Mission Viejo, CA). Important drivers include average rooftop suitability, household footprint/per-capita roof space, the quality of the solar resource, and the city’s estimated electricity consumption. In addition to city-wide results, we also estimate the ability of aggregations of households to offset their electricity consumption with PV. In a companion article, we will use statistical modeling to extend our results and estimate national rooftop PV technical potential. In addition, our publically available data and methods may help policy makers, utilities, researchers, and others perform customized analyses to meet their specific needs.
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; El-Abasawi, Nasr M.; El-Olemy, Ahmed; Serag, Ahmed
2018-02-01
Five simple spectrophotometric methods were developed for the determination of simeprevir in the presence of its oxidative degradation product namely, ratio difference, mean centering, derivative ratio using the Savitsky-Golay filters, second derivative and continuous wavelet transform. These methods are linear in the range of 2.5-40 μg/mL and validated according to the ICH guidelines. The obtained results of accuracy, repeatability and precision were found to be within the acceptable limits. The specificity of the proposed methods was tested using laboratory prepared mixtures and assessed by applying the standard addition technique. Furthermore, these methods were statistically comparable to RP-HPLC method and good results were obtained. So, they can be used for the routine analysis of simeprevir in quality-control laboratories.
Eisenberg, Dan T A; Kuzawa, Christopher W; Hayes, M Geoffrey
2015-01-01
Telomere length (TL) is commonly measured using quantitative PCR (qPCR). Although, easier than the southern blot of terminal restriction fragments (TRF) TL measurement method, one drawback of qPCR is that it introduces greater measurement error and thus reduces the statistical power of analyses. To address a potential source of measurement error, we consider the effect of well position on qPCR TL measurements. qPCR TL data from 3,638 people run on a Bio-Rad iCycler iQ are reanalyzed here. To evaluate measurement validity, correspondence with TRF, age, and between mother and offspring are examined. First, we present evidence for systematic variation in qPCR TL measurements in relation to thermocycler well position. Controlling for these well-position effects consistently improves measurement validity and yields estimated improvements in statistical power equivalent to increasing sample sizes by 16%. We additionally evaluated the linearity of the relationships between telomere and single copy gene control amplicons and between qPCR and TRF measures. We find that, unlike some previous reports, our data exhibit linear relationships. We introduce the standard error in percent, a superior method for quantifying measurement error as compared to the commonly used coefficient of variation. Using this measure, we find that excluding samples with high measurement error does not improve measurement validity in our study. Future studies using block-based thermocyclers should consider well position effects. Since additional information can be gleaned from well position corrections, rerunning analyses of previous results with well position correction could serve as an independent test of the validity of these results. © 2015 Wiley Periodicals, Inc.
THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES
Song, Chi; Min, Xiaoyi; Zhang, Heping
2016-01-01
The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239
Voitenkov, Voitenkov Vladislav; Andrey, Klimkin; Natalia, Skripchenko; Anastasia, Aksenova
2017-01-01
Context: The diagnosis of polyneuropathy may be challenging at the early stages of the disease. Despite electromyography (EMG) efficacy in the establishment of polyneuropathy diagnosis, in some cases, results are dubious and neurophysiologists may implement additional techniques to ensure that conduction is affected. Aims: The aim of the study was to evaluate motor-evoked potential (MEP) characteristics in children with acute inflammatory demyelinating polyneuropathy (AIDP). Settings and Design: The study was conducted at a pediatric research and clinical center for infectious diseases. Subjects and Methods: Twenty healthy children (7–14 years old) without signs of neurological disorders were enrolled as controls. Thirty-seven patients (8–13 years old) with AIDP were enrolled as the main group. EMG and transcranial magnetic stimulation (TMS) were performed on the 3rd–7th days from the onset of the first symptoms. Statistical Analysis Used: Descriptive statistics and Student's t-test were used. Bonferroni method was applied to implement appropriate corrections for multiple comparisons. Results: Significant differences between children with AIDP and controls on latencies of both cortical and lumbar MEPs were registered. Cortical MEP shapes were disperse in 100% of the cases and lumbar MEPs were disperse in 57% of the cases. Conclusions: Diagnostic TMS on the early stage of the AIDP in children may be implemented as the additional tool. The main finding in this population is lengthening of the latency of cortical and lumbar MEPs. Disperse shape of the lumbar MEPs may be used as the early sign of the acute demyelization. PMID:28904571
Kling, Daniel; Egeland, Thore; Piñero, Mariana Herrera; Vigeland, Magnus Dehli
2017-11-01
Methods and implementations of DNA-based identification are well established in several forensic contexts. However, assessing the statistical power of these methods has been largely overlooked, except in the simplest cases. In this paper we outline general methods for such power evaluation, and apply them to a large set of family reunification cases, where the objective is to decide whether a person of interest (POI) is identical to the missing person (MP) in a family, based on the DNA profile of the POI and available family members. As such, this application closely resembles database searching and disaster victim identification (DVI). If parents or children of the MP are available, they will typically provide sufficient statistical evidence to settle the case. However, if one must resort to more distant relatives, it is not a priori obvious that a reliable conclusion is likely to be reached. In these cases power evaluation can be highly valuable, for instance in the recruitment of additional family members. To assess the power in an identification case, we advocate the combined use of two statistics: the Probability of Exclusion, and the Probability of Exceedance. The former is the probability that the genotypes of a random, unrelated person are incompatible with the available family data. If this is close to 1, it is likely that a conclusion will be achieved regarding general relatedness, but not necessarily the specific relationship. To evaluate the ability to recognize a true match, we use simulations to estimate exceedance probabilities, i.e. the probability that the likelihood ratio will exceed a given threshold, assuming that the POI is indeed the MP. All simulations are done conditionally on available family data. Such conditional simulations have a long history in medical linkage analysis, but to our knowledge this is the first systematic forensic genetics application. Also, for forensic markers mutations cannot be ignored and therefore current models and implementations must be extended. All the tools are freely available in Familias (http://www.familias.no) empowered by the R library paramlink. The above approach is applied to a large and important data set: 'The missing grandchildren of Argentina'. We evaluate the power of 196 families from the DNA reference databank (Banco Nacional de Datos Genéticos, http://www.bndg.gob.ar. As a result we show that 58 of the families have poor statistical power and require additional genetic data to enable a positive identification. Copyright © 2017 Elsevier B.V. All rights reserved.
Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.
2016-01-01
Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions. PMID:27006288
Image compression system and method having optimized quantization tables
NASA Technical Reports Server (NTRS)
Ratnakar, Viresh (Inventor); Livny, Miron (Inventor)
1998-01-01
A digital image compression preprocessor for use in a discrete cosine transform-based digital image compression device is provided. The preprocessor includes a gathering mechanism for determining discrete cosine transform statistics from input digital image data. A computing mechanism is operatively coupled to the gathering mechanism to calculate a image distortion array and a rate of image compression array based upon the discrete cosine transform statistics for each possible quantization value. A dynamic programming mechanism is operatively coupled to the computing mechanism to optimize the rate of image compression array against the image distortion array such that a rate-distortion-optimal quantization table is derived. In addition, a discrete cosine transform-based digital image compression device and a discrete cosine transform-based digital image compression and decompression system are provided. Also, a method for generating a rate-distortion-optimal quantization table, using discrete cosine transform-based digital image compression, and operating a discrete cosine transform-based digital image compression and decompression system are provided.
On vital aid: the why, what and how of validation
Kleywegt, Gerard J.
2009-01-01
Limitations to the data and subjectivity in the structure-determination process may cause errors in macromolecular crystal structures. Appropriate validation techniques may be used to reveal problems in structures, ideally before they are analysed, published or deposited. Additionally, such techniques may be used a posteriori to assess the (relative) merits of a model by potential users. Weak validation methods and statistics assess how well a model reproduces the information that was used in its construction (i.e. experimental data and prior knowledge). Strong methods and statistics, on the other hand, test how well a model predicts data or information that were not used in the structure-determination process. These may be data that were excluded from the process on purpose, general knowledge about macromolecular structure, information about the biological role and biochemical activity of the molecule under study or its mutants or complexes and predictions that are based on the model and that can be tested experimentally. PMID:19171968
Statistics, Computation, and Modeling in Cosmology
NASA Astrophysics Data System (ADS)
Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology
2017-01-01
Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.
Heintze, S D; Zellweger, G; Cavalleri, A; Ferracane, J
2006-02-01
The aim of the study was to evaluate two ceramic materials as possible substitutes for enamel using two wear simulation methods, and to compare both methods with regard to the wear results for different materials. Flat specimens (OHSU n=6, Ivoclar n=8) of one compomer and three composite materials (Dyract AP, Tetric Ceram, Z250, experimental composite) were fabricated and subjected to wear using two different wear testing methods and two pressable ceramic materials as stylus (Empress, experimental ceramic). For the OHSU method, enamel styli of the same dimensions as the ceramic stylus were fabricated additionally. Both wear testing methods differ with regard to loading force, lateral movement of stylus, stylus dimension, number of cycles, thermocycling and abrasive medium. In the OHSU method, the wear facets (mean vertical loss) were measured using a contact profilometer, while in the Ivoclar method (maximal vertical loss) a laser scanner was used for this purpose. Additionally, the vertical loss of the ceramic stylus was quantified for the Ivoclar method. The results obtained from each method were compared by ANOVA and Tukey's test (p<0.05). To compare both wear methods, the log-transformed data were used to establish relative ranks between material/stylus combinations and assessed by applying the Pearson correlation coefficient. The experimental ceramic material generated significantly less wear in Tetric Ceram and Z250 specimens compared to the Empress stylus in the Ivoclar method, whereas with the OHSU method, no difference between the two ceramic antagonists was found with regard to abrasion or attrition. The wear generated by the enamel stylus was not statistically different from that generated by the other two ceramic materials in the OHSU method. With the Ivoclar method, wear of the ceramic stylus was only statistically different when in contact with Tetric Ceram. There was a close correlation between the attrition wear of the OHSU and the wear of the Ivoclar method (Pearson coefficient 0.83, p=0.01). Pressable ceramic materials can be used as a substitute for enamel in wear testing machines. However, material ranking may be affected by the type of ceramic material chosen. The attrition wear of the OHSU method was comparable with the wear generated with the Ivoclar method.
Karimi, Davood; Samei, Golnoosh; Kesch, Claudia; Nir, Guy; Salcudean, Septimiu E
2018-05-15
Most of the existing convolutional neural network (CNN)-based medical image segmentation methods are based on methods that have originally been developed for segmentation of natural images. Therefore, they largely ignore the differences between the two domains, such as the smaller degree of variability in the shape and appearance of the target volume and the smaller amounts of training data in medical applications. We propose a CNN-based method for prostate segmentation in MRI that employs statistical shape models to address these issues. Our CNN predicts the location of the prostate center and the parameters of the shape model, which determine the position of prostate surface keypoints. To train such a large model for segmentation of 3D images using small data (1) we adopt a stage-wise training strategy by first training the network to predict the prostate center and subsequently adding modules for predicting the parameters of the shape model and prostate rotation, (2) we propose a data augmentation method whereby the training images and their prostate surface keypoints are deformed according to the displacements computed based on the shape model, and (3) we employ various regularization techniques. Our proposed method achieves a Dice score of 0.88, which is obtained by using both elastic-net and spectral dropout for regularization. Compared with a standard CNN-based method, our method shows significantly better segmentation performance on the prostate base and apex. Our experiments also show that data augmentation using the shape model significantly improves the segmentation results. Prior knowledge about the shape of the target organ can improve the performance of CNN-based segmentation methods, especially where image features are not sufficient for a precise segmentation. Statistical shape models can also be employed to synthesize additional training data that can ease the training of large CNNs.
Three-dimensional accuracy of different correction methods for cast implant bars
Kwon, Ji-Yung; Kim, Chang-Whe; Lim, Young-Jun; Kwon, Ho-Beom
2014-01-01
PURPOSE The aim of the present study was to evaluate the accuracy of three techniques for correction of cast implant bars. MATERIALS AND METHODS Thirty cast implant bars were fabricated on a metal master model. All cast implant bars were sectioned at 5 mm from the left gold cylinder using a disk of 0.3 mm thickness, and then each group of ten specimens was corrected by gas-air torch soldering, laser welding, and additional casting technique. Three dimensional evaluation including horizontal, vertical, and twisting measurements was based on measurement and comparison of (1) gap distances of the right abutment replica-gold cylinder interface at buccal, distal, lingual side, (2) changes of bar length, and (3) axis angle changes of the right gold cylinders at the step of the post-correction measurements on the three groups with a contact and non-contact coordinate measuring machine. One-way analysis of variance (ANOVA) and paired t-test were performed at the significance level of 5%. RESULTS Gap distances of the cast implant bars after correction procedure showed no statistically significant difference among groups. Changes in bar length between pre-casting and post-correction measurement were statistically significance among groups. Axis angle changes of the right gold cylinders were not statistically significance among groups. CONCLUSION There was no statistical significance among three techniques in horizontal, vertical and axial errors. But, gas-air torch soldering technique showed the most consistent and accurate trend in the correction of implant bar error. However, Laser welding technique, showed a large mean and standard deviation in vertical and twisting measurement and might be technique-sensitive method. PMID:24605205
Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA.
Festing, M F
2001-01-01
In vitro experiments need to be well designed and correctly analysed if they are to achieve their full potential to replace the use of animals in research. An "experiment" is a procedure for collecting scientific data in order to answer a hypothesis, or to provide material for generating new hypotheses, and differs from a survey because the scientist has control over the treatments that can be applied. Most experiments can be classified into one of a few formal designs, the most common being completely randomised, and randomised block designs. These are quite common with in vitro experiments, which are often replicated in time. Some experiments involve a single independent (treatment) variable, while other "factorial" designs simultaneously vary two or more independent variables, such as drug treatment and cell line. Factorial designs often provide additional information at little extra cost. Experiments need to be carefully planned to avoid bias, be powerful yet simple, provide for a valid statistical analysis and, in some cases, have a wide range of applicability. Virtually all experiments need some sort of statistical analysis in order to take account of biological variation among the experimental subjects. Parametric methods using the t test or analysis of variance are usually more powerful than non-parametric methods, provided the underlying assumptions of normality of the residuals and equal variances are approximately valid. The statistical analyses of data from a completely randomised design, and from a randomised-block design are demonstrated in Appendices 1 and 2, and methods of determining sample size are discussed in Appendix 3. Appendix 4 gives a checklist for authors submitting papers to ATLA.
FabricS: A user-friendly, complete and robust software for particle shape-fabric analysis
NASA Astrophysics Data System (ADS)
Moreno Chávez, G.; Castillo Rivera, F.; Sarocchi, D.; Borselli, L.; Rodríguez-Sedano, L. A.
2018-06-01
Shape-fabric is a textural parameter related to the spatial arrangement of elongated particles in geological samples. Its usefulness spans a range from sedimentary petrology to igneous and metamorphic petrology. Independently of the process being studied, when a material flows, the elongated particles are oriented with the major axis in the direction of flow. In sedimentary petrology this information has been used for studies of paleo-flow direction of turbidites, the origin of quartz sediments, and locating ignimbrite vents, among others. In addition to flow direction and its polarity, the method enables flow rheology to be inferred. The use of shape-fabric has been limited due to the difficulties of automatically measuring particles and analyzing them with reliable circular statistics programs. This has dampened interest in the method for a long time. Shape-fabric measurement has increased in popularity since the 1980s thanks to the development of new image analysis techniques and circular statistics software. However, the programs currently available are unreliable, old and are incompatible with newer operating systems, or require programming skills. The goal of our work is to develop a user-friendly program, in the MATLAB environment, with a graphical user interface, that can process images and includes editing functions, and thresholds (elongation and size) for selecting a particle population and analyzing it with reliable circular statistics algorithms. Moreover, the method also has to produce rose diagrams, orientation vectors, and a complete series of statistical parameters. All these requirements are met by our new software. In this paper, we briefly explain the methodology from collection of oriented samples in the field to the minimum number of particles needed to obtain reliable fabric data. We obtained the data using specific statistical tests and taking into account the degree of iso-orientation of the samples and the required degree of reliability. The program has been verified by means of several simulations performed using appropriately designed features and by analyzing real samples.
Statistical appearance models based on probabilistic correspondences.
Krüger, Julia; Ehrhardt, Jan; Handels, Heinz
2017-04-01
Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.
Valcarcel, Alessandra M; Linn, Kristin A; Vandekar, Simon N; Satterthwaite, Theodore D; Muschelli, John; Calabresi, Peter A; Pham, Dzung L; Martin, Melissa Lynne; Shinohara, Russell T
2018-03-08
Magnetic resonance imaging (MRI) is crucial for in vivo detection and characterization of white matter lesions (WMLs) in multiple sclerosis. While WMLs have been studied for over two decades using MRI, automated segmentation remains challenging. Although the majority of statistical techniques for the automated segmentation of WMLs are based on single imaging modalities, recent advances have used multimodal techniques for identifying WMLs. Complementary modalities emphasize different tissue properties, which help identify interrelated features of lesions. Method for Inter-Modal Segmentation Analysis (MIMoSA), a fully automatic lesion segmentation algorithm that utilizes novel covariance features from intermodal coupling regression in addition to mean structure to model the probability lesion is contained in each voxel, is proposed. MIMoSA was validated by comparison with both expert manual and other automated segmentation methods in two datasets. The first included 98 subjects imaged at Johns Hopkins Hospital in which bootstrap cross-validation was used to compare the performance of MIMoSA against OASIS and LesionTOADS, two popular automatic segmentation approaches. For a secondary validation, a publicly available data from a segmentation challenge were used for performance benchmarking. In the Johns Hopkins study, MIMoSA yielded average Sørensen-Dice coefficient (DSC) of .57 and partial AUC of .68 calculated with false positive rates up to 1%. This was superior to performance using OASIS and LesionTOADS. The proposed method also performed competitively in the segmentation challenge dataset. MIMoSA resulted in statistically significant improvements in lesion segmentation performance compared with LesionTOADS and OASIS, and performed competitively in an additional validation study. Copyright © 2018 by the American Society of Neuroimaging.
Evaluation of redundancy analysis to identify signatures of local adaptation.
Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric
2018-05-26
Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Eisinga, Rob; Heskes, Tom; Pelzer, Ben; Te Grotenhuis, Manfred
2017-01-25
The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to such tests rely on large-sample approximations, due to the numerical complexity of computing the exact distribution. These approximate methods lead to inaccurate estimates in the tail of the distribution, which is most relevant for p-value calculation. We propose an efficient, combinatorial exact approach for calculating the probability mass distribution of the rank sum difference statistic for pairwise comparison of Friedman rank sums, and compare exact results with recommended asymptotic approximations. Whereas the chi-squared approximation performs inferiorly to exact computation overall, others, particularly the normal, perform well, except for the extreme tail. Hence exact calculation offers an improvement when small p-values occur following multiple testing correction. Exact inference also enhances the identification of significant differences whenever the observed values are close to the approximate critical value. We illustrate the proposed method in the context of biological machine learning, were Friedman rank sum difference tests are commonly used for the comparison of classifiers over multiple datasets. We provide a computationally fast method to determine the exact p-value of the absolute rank sum difference of a pair of Friedman rank sums, making asymptotic tests obsolete. Calculation of exact p-values is easy to implement in statistical software and the implementation in R is provided in one of the Additional files and is also available at http://www.ru.nl/publish/pages/726696/friedmanrsd.zip .
NASA Astrophysics Data System (ADS)
Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.
2018-01-01
Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.
Load estimator (LOADEST): a FORTRAN program for estimating constituent loads in streams and rivers
Runkel, Robert L.; Crawford, Charles G.; Cohn, Timothy A.
2004-01-01
LOAD ESTimator (LOADEST) is a FORTRAN program for estimating constituent loads in streams and rivers. Given a time series of streamflow, additional data variables, and constituent concentration, LOADEST assists the user in developing a regression model for the estimation of constituent load (calibration). Explanatory variables within the regression model include various functions of streamflow, decimal time, and additional user-specified data variables. The formulated regression model then is used to estimate loads over a user-specified time interval (estimation). Mean load estimates, standard errors, and 95 percent confidence intervals are developed on a monthly and(or) seasonal basis. The calibration and estimation procedures within LOADEST are based on three statistical estimation methods. The first two methods, Adjusted Maximum Likelihood Estimation (AMLE) and Maximum Likelihood Estimation (MLE), are appropriate when the calibration model errors (residuals) are normally distributed. Of the two, AMLE is the method of choice when the calibration data set (time series of streamflow, additional data variables, and concentration) contains censored data. The third method, Least Absolute Deviation (LAD), is an alternative to maximum likelihood estimation when the residuals are not normally distributed. LOADEST output includes diagnostic tests and warnings to assist the user in determining the appropriate estimation method and in interpreting the estimated loads. This report describes the development and application of LOADEST. Sections of the report describe estimation theory, input/output specifications, sample applications, and installation instructions.
Multiple point statistical simulation using uncertain (soft) conditional data
NASA Astrophysics Data System (ADS)
Hansen, Thomas Mejer; Vu, Le Thanh; Mosegaard, Klaus; Cordua, Knud Skou
2018-05-01
Geostatistical simulation methods have been used to quantify spatial variability of reservoir models since the 80s. In the last two decades, state of the art simulation methods have changed from being based on covariance-based 2-point statistics to multiple-point statistics (MPS), that allow simulation of more realistic Earth-structures. In addition, increasing amounts of geo-information (geophysical, geological, etc.) from multiple sources are being collected. This pose the problem of integration of these different sources of information, such that decisions related to reservoir models can be taken on an as informed base as possible. In principle, though difficult in practice, this can be achieved using computationally expensive Monte Carlo methods. Here we investigate the use of sequential simulation based MPS simulation methods conditional to uncertain (soft) data, as a computational efficient alternative. First, it is demonstrated that current implementations of sequential simulation based on MPS (e.g. SNESIM, ENESIM and Direct Sampling) do not account properly for uncertain conditional information, due to a combination of using only co-located information, and a random simulation path. Then, we suggest two approaches that better account for the available uncertain information. The first make use of a preferential simulation path, where more informed model parameters are visited preferentially to less informed ones. The second approach involves using non co-located uncertain information. For different types of available data, these approaches are demonstrated to produce simulation results similar to those obtained by the general Monte Carlo based approach. These methods allow MPS simulation to condition properly to uncertain (soft) data, and hence provides a computationally attractive approach for integration of information about a reservoir model.
Shoari, Niloofar; Dubé, Jean-Sébastien; Chenouri, Shoja'eddin
2015-11-01
In environmental studies, concentration measurements frequently fall below detection limits of measuring instruments, resulting in left-censored data. Some studies employ parametric methods such as the maximum likelihood estimator (MLE), robust regression on order statistic (rROS), and gamma regression on order statistic (GROS), while others suggest a non-parametric approach, the Kaplan-Meier method (KM). Using examples of real data from a soil characterization study in Montreal, we highlight the need for additional investigations that aim at unifying the existing literature. A number of studies have examined this issue; however, those considering data skewness and model misspecification are rare. These aspects are investigated in this paper through simulations. Among other findings, results show that for low skewed data, the performance of different statistical methods is comparable, regardless of the censoring percentage and sample size. For highly skewed data, the performance of the MLE method under lognormal and Weibull distributions is questionable; particularly, when the sample size is small or censoring percentage is high. In such conditions, MLE under gamma distribution, rROS, GROS, and KM are less sensitive to skewness. Related to model misspecification, MLE based on lognormal and Weibull distributions provides poor estimates when the true distribution of data is misspecified. However, the methods of rROS, GROS, and MLE under gamma distribution are generally robust to model misspecifications regardless of skewness, sample size, and censoring percentage. Since the characteristics of environmental data (e.g., type of distribution and skewness) are unknown a priori, we suggest using MLE based on gamma distribution, rROS and GROS. Copyright © 2015 Elsevier Ltd. All rights reserved.
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics
2016-01-01
Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. PMID:27729304
Analysis of Statistical Methods and Errors in the Articles Published in the Korean Journal of Pain
Yim, Kyoung Hoon; Han, Kyoung Ah; Park, Soo Young
2010-01-01
Background Statistical analysis is essential in regard to obtaining objective reliability for medical research. However, medical researchers do not have enough statistical knowledge to properly analyze their study data. To help understand and potentially alleviate this problem, we have analyzed the statistical methods and errors of articles published in the Korean Journal of Pain (KJP), with the intention to improve the statistical quality of the journal. Methods All the articles, except case reports and editorials, published from 2004 to 2008 in the KJP were reviewed. The types of applied statistical methods and errors in the articles were evaluated. Results One hundred and thirty-nine original articles were reviewed. Inferential statistics and descriptive statistics were used in 119 papers and 20 papers, respectively. Only 20.9% of the papers were free from statistical errors. The most commonly adopted statistical method was the t-test (21.0%) followed by the chi-square test (15.9%). Errors of omission were encountered 101 times in 70 papers. Among the errors of omission, "no statistics used even though statistical methods were required" was the most common (40.6%). The errors of commission were encountered 165 times in 86 papers, among which "parametric inference for nonparametric data" was the most common (33.9%). Conclusions We found various types of statistical errors in the articles published in the KJP. This suggests that meticulous attention should be given not only in the applying statistical procedures but also in the reviewing process to improve the value of the article. PMID:20552071
2017 Annual Disability Statistics Supplement
ERIC Educational Resources Information Center
Lauer, E. A; Houtenville, A. J.
2018-01-01
The "Annual Disability Statistics Supplement" is a companion report to the "Annual Disability Statistics Compendium." The "Supplement" presents statistics on the same topics as the "Compendium," with additional categorizations by demographic characteristics including age, gender and race/ethnicity. In…
Wall, Jason; Conrad, Rick; Latham, Kathy; Liu, Eric
2014-03-01
Real-time PCR methods for detecting foodborne pathogens offer the advantages of simplicity and quick time to results compared to traditional culture methods. The addition of a recirculating pooled immunomagnetic separation method prior to real-time PCR analysis increases processing output while reducing both cost and labor. This AOAC Research Institute method modification study validates the MicroSEQ® Salmonella spp. Detection Kit [AOAC Performance Tested Method (PTM) 031001] linked with the Pathatrix® 10-Pooling Salmonella spp. Kit (AOAC PTM 090203C) in diced tomatoes, chocolate, and deli ham. The Pathatrix 10-Pooling protocol represents a method modification of the enrichment portion of the MicroSEQ Salmonella spp. The results of the method modification were compared to standard cultural reference methods for diced tomatoes, chocolate, and deli ham. All three matrixes were analyzed in a paired study design. An additional set of chocolate test portions was analyzed using an alternative enrichment medium in an unpaired study design. For all matrixes tested, there were no statistically significant differences in the number of positive test portions detected by the modified candidate method compared to the appropriate reference method. The MicroSEQ Salmonella spp. protocol linked with the Pathatrix individual or 10-Pooling procedure demonstrated reliability as a rapid, simplified, method for the preparation of samples and subsequent detection of Salmonella in diced tomatoes, chocolate, and deli ham.
NASA Astrophysics Data System (ADS)
Kim, Dae Hoe; Choi, Jae Young; Choi, Seon Hyeong; Ro, Yong Man
2012-03-01
In this study, a novel mammogram enhancement solution is proposed, aiming to improve the quality of subsequent mass segmentation in mammograms. It has been widely accepted that characteristics of masses are usually hyper-dense or uniform density with respect to its background. Also, their core parts are likely to have high-intensity values while the values of intensity tend to be decreased as the distance to core parts increases. Based on the aforementioned observations, we develop a new and effective mammogram enhancement method by combining local statistical measurements and Sliding Band Filtering (SBF). By effectively combining local statistical measurements and SBF, we are able to improve the contrast of the bright and smooth regions (which represent potential mass regions), as well as, at the same time, the regions where their surrounding gradients are converging to the centers of regions of interest. In this study, 89 mammograms were collected from the public MAIS database (DB) to demonstrate the effectiveness of the proposed enhancement solution in terms of improving mass segmentation. As for a segmentation method, widely used contour-based segmentation approach was employed. The contour-based method in conjunction with the proposed enhancement solution achieved overall detection accuracy of 92.4% with a total of 85 correct cases. On the other hand, without using our enhancement solution, overall detection accuracy of the contour-based method was only 78.3%. In addition, experimental results demonstrated the feasibility of our enhancement solution for the purpose of improving detection accuracy on mammograms containing dense parenchymal patterns.
Sun, Xing; Li, Xiaoyun; Chen, Cong; Song, Yang
2013-01-01
Frequent rise of interval-censored time-to-event data in randomized clinical trials (e.g., progression-free survival [PFS] in oncology) challenges statistical researchers in the pharmaceutical industry in various ways. These challenges exist in both trial design and data analysis. Conventional statistical methods treating intervals as fixed points, which are generally practiced by pharmaceutical industry, sometimes yield inferior or even flawed analysis results in extreme cases for interval-censored data. In this article, we examine the limitation of these standard methods under typical clinical trial settings and further review and compare several existing nonparametric likelihood-based methods for interval-censored data, methods that are more sophisticated but robust. Trial design issues involved with interval-censored data comprise another topic to be explored in this article. Unlike right-censored survival data, expected sample size or power for a trial with interval-censored data relies heavily on the parametric distribution of the baseline survival function as well as the frequency of assessments. There can be substantial power loss in trials with interval-censored data if the assessments are very infrequent. Such an additional dependency controverts many fundamental assumptions and principles in conventional survival trial designs, especially the group sequential design (e.g., the concept of information fraction). In this article, we discuss these fundamental changes and available tools to work around their impacts. Although progression-free survival is often used as a discussion point in the article, the general conclusions are equally applicable to other interval-censored time-to-event endpoints.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nedic, Vladimir, E-mail: vnedic@kg.ac.rs; Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs; Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs
2014-11-15
Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. Themore » output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.« less
Displaying R spatial statistics on Google dynamic maps with web applications created by Rwui.
Newton, Richard; Deonarine, Andrew; Wernisch, Lorenz
2012-09-24
The R project includes a large variety of packages designed for spatial statistics. Google dynamic maps provide web based access to global maps and satellite imagery. We describe a method for displaying directly the spatial output from an R script on to a Google dynamic map. This is achieved by creating a Java based web application which runs the R script and then displays the results on the dynamic map. In order to make this method easy to implement by those unfamiliar with programming Java based web applications, we have added the method to the options available in the R Web User Interface (Rwui) application. Rwui is an established web application for creating web applications for running R scripts. A feature of Rwui is that all the code for the web application being created is generated automatically so that someone with no knowledge of web programming can make a fully functional web application for running an R script in a matter of minutes. Rwui can now be used to create web applications that will display the results from an R script on a Google dynamic map. Results may be displayed as discrete markers and/or as continuous overlays. In addition, users of the web application may select regions of interest on the dynamic map with mouse clicks and the coordinates of the region of interest will automatically be made available for use by the R script. This method of displaying R output on dynamic maps is designed to be of use in a number of areas. Firstly it allows statisticians, working in R and developing methods in spatial statistics, to easily visualise the results of applying their methods to real world data. Secondly, it allows researchers who are using R to study health geographics data, to display their results directly onto dynamic maps. Thirdly, by creating a web application for running an R script, a statistician can enable users entirely unfamiliar with R to run R coded statistical analyses of health geographics data. Fourthly, we envisage an educational role for such applications.
Yin, Ke; Dou, Xiaomin; Ren, Fei; Chan, Wei-Ping; Chang, Victor Wei-Chung
2018-02-15
Bottom ashes generated from municipal solid waste incineration have gained increasing popularity as alternative construction materials, however, they contains elevated heavy metals posing a challenge for its free usage. Different leaching methods are developed to quantify leaching potential of incineration bottom ashes meanwhile guide its environmentally friendly application. Yet, there are diverse IBA applications while the in situ environment is always complicated, challenging its legislation. In this study, leaching tests were conveyed using batch and column leaching methods with seawater as opposed to deionized water, to unveil the metal leaching potential of IBA subjected to salty environment, which is commonly encountered when using IBA in land reclamation yet not well understood. Statistical analysis for different leaching methods suggested disparate performance between seawater and deionized water primarily ascribed to ionic strength. Impacts of leachant are metal-specific dependent on leaching methods and have a function of intrinsic characteristics of incineration bottom ashes. Leaching performances were further compared on additional perspectives, e.g. leaching approach and liquid to solid ratio, indicating sophisticated leaching potentials dominated by combined geochemistry. It is necessary to develop application-oriented leaching methods with corresponding leaching criteria to preclude discriminations between different applications, e.g., terrestrial applications vs. land reclamation. Copyright © 2017 Elsevier B.V. All rights reserved.
Richter, R; Hartmann, A; Meyer, A E; Rüger, U
1994-01-01
Thomas and Schmitz claim that they "deliver a proof for the effectiveness of humanistic methods" (p. 25) with their study. However, they did not or were not able to verify their claim due to several reasons: The authors did not say if and if so to what extent the treatments carried out within the framework of the TK-regulation were treatments using humanistic methods. The validity of the only criterium used by the authors, the average duration of the inability to work, must be questioned. The inferential statistical treatment of the data is insufficient; a non-parametrical evaluation is necessary. Especially missing are personal details concerning the treatment groups (age, sex, occupation, method, duration and frequency of therapy), which are indispensable for a differentiated interpretation. In addition there are numerous formal faults (wrong quotations, mistakes in tables, unclear terms etc.). In view of this criticism we come to the conclusion that the results are to a large degree worthless, at least until several of our objections have been refuted by further information and adequate inferential statistical methods. This study is especially unsuitable to prove a however defined "effectiveness of out-patient psychotherapies", therefore also not suitable to prove the effectiveness of those treatments conducted within the framework of the TK-regulation and especially not suitable to prove the superiority of humanistic methods in comparison with psychoanalytic methods and behavioural therapy.
Interactive semiautomatic contour delineation using statistical conditional random fields framework.
Hu, Yu-Chi; Grossberg, Michael D; Wu, Abraham; Riaz, Nadeem; Perez, Carmen; Mageras, Gig S
2012-07-01
Contouring a normal anatomical structure during radiation treatment planning requires significant time and effort. The authors present a fast and accurate semiautomatic contour delineation method to reduce the time and effort required of expert users. Following an initial segmentation on one CT slice, the user marks the target organ and nontarget pixels with a few simple brush strokes. The algorithm calculates statistics from this information that, in turn, determines the parameters of an energy function containing both boundary and regional components. The method uses a conditional random field graphical model to define the energy function to be minimized for obtaining an estimated optimal segmentation, and a graph partition algorithm to efficiently solve the energy function minimization. Organ boundary statistics are estimated from the segmentation and propagated to subsequent images; regional statistics are estimated from the simple brush strokes that are either propagated or redrawn as needed on subsequent images. This greatly reduces the user input needed and speeds up segmentations. The proposed method can be further accelerated with graph-based interpolation of alternating slices in place of user-guided segmentation. CT images from phantom and patients were used to evaluate this method. The authors determined the sensitivity and specificity of organ segmentations using physician-drawn contours as ground truth, as well as the predicted-to-ground truth surface distances. Finally, three physicians evaluated the contours for subjective acceptability. Interobserver and intraobserver analysis was also performed and Bland-Altman plots were used to evaluate agreement. Liver and kidney segmentations in patient volumetric CT images show that boundary samples provided on a single CT slice can be reused through the entire 3D stack of images to obtain accurate segmentation. In liver, our method has better sensitivity and specificity (0.925 and 0.995) than region growing (0.897 and 0.995) and level set methods (0.912 and 0.985) as well as shorter mean predicted-to-ground truth distance (2.13 mm) compared to regional growing (4.58 mm) and level set methods (8.55 mm and 4.74 mm). Similar results are observed in kidney segmentation. Physician evaluation of ten liver cases showed that 83% of contours did not need any modification, while 6% of contours needed modifications as assessed by two or more evaluators. In interobserver and intraobserver analysis, Bland-Altman plots showed our method to have better repeatability than the manual method while the delineation time was 15% faster on average. Our method achieves high accuracy in liver and kidney segmentation and considerably reduces the time and labor required for contour delineation. Since it extracts purely statistical information from the samples interactively specified by expert users, the method avoids heuristic assumptions commonly used by other methods. In addition, the method can be expanded to 3D directly without modification because the underlying graphical framework and graph partition optimization method fit naturally with the image grid structure.
Beichel, Reinhard R.; Van Tol, Markus; Ulrich, Ethan J.; Bauer, Christian; Chang, Tangel; Plichta, Kristin A.; Smith, Brian J.; Sunderland, John J.; Graham, Michael M.; Sonka, Milan; Buatti, John M.
2016-01-01
Purpose: The purpose of this work was to develop, validate, and compare a highly computer-aided method for the segmentation of hot lesions in head and neck 18F-FDG PET scans. Methods: A semiautomated segmentation method was developed, which transforms the segmentation problem into a graph-based optimization problem. For this purpose, a graph structure around a user-provided approximate lesion centerpoint is constructed and a suitable cost function is derived based on local image statistics. To handle frequently occurring situations that are ambiguous (e.g., lesions adjacent to each other versus lesion with inhomogeneous uptake), several segmentation modes are introduced that adapt the behavior of the base algorithm accordingly. In addition, the authors present approaches for the efficient interactive local and global refinement of initial segmentations that are based on the “just-enough-interaction” principle. For method validation, 60 PET/CT scans from 59 different subjects with 230 head and neck lesions were utilized. All patients had squamous cell carcinoma of the head and neck. A detailed comparison with the current clinically relevant standard manual segmentation approach was performed based on 2760 segmentations produced by three experts. Results: Segmentation accuracy measured by the Dice coefficient of the proposed semiautomated and standard manual segmentation approach was 0.766 and 0.764, respectively. This difference was not statistically significant (p = 0.2145). However, the intra- and interoperator standard deviations were significantly lower for the semiautomated method. In addition, the proposed method was found to be significantly faster and resulted in significantly higher intra- and interoperator segmentation agreement when compared to the manual segmentation approach. Conclusions: Lack of consistency in tumor definition is a critical barrier for radiation treatment targeting as well as for response assessment in clinical trials and in clinical oncology decision-making. The properties of the authors approach make it well suited for applications in image-guided radiation oncology, response assessment, or treatment outcome prediction. PMID:27277044
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beichel, Reinhard R., E-mail: reinhard-beichel@uiowa.edu; Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, Iowa 52242; Department of Internal Medicine, University of Iowa, Iowa City, Iowa 52242
Purpose: The purpose of this work was to develop, validate, and compare a highly computer-aided method for the segmentation of hot lesions in head and neck 18F-FDG PET scans. Methods: A semiautomated segmentation method was developed, which transforms the segmentation problem into a graph-based optimization problem. For this purpose, a graph structure around a user-provided approximate lesion centerpoint is constructed and a suitable cost function is derived based on local image statistics. To handle frequently occurring situations that are ambiguous (e.g., lesions adjacent to each other versus lesion with inhomogeneous uptake), several segmentation modes are introduced that adapt the behaviormore » of the base algorithm accordingly. In addition, the authors present approaches for the efficient interactive local and global refinement of initial segmentations that are based on the “just-enough-interaction” principle. For method validation, 60 PET/CT scans from 59 different subjects with 230 head and neck lesions were utilized. All patients had squamous cell carcinoma of the head and neck. A detailed comparison with the current clinically relevant standard manual segmentation approach was performed based on 2760 segmentations produced by three experts. Results: Segmentation accuracy measured by the Dice coefficient of the proposed semiautomated and standard manual segmentation approach was 0.766 and 0.764, respectively. This difference was not statistically significant (p = 0.2145). However, the intra- and interoperator standard deviations were significantly lower for the semiautomated method. In addition, the proposed method was found to be significantly faster and resulted in significantly higher intra- and interoperator segmentation agreement when compared to the manual segmentation approach. Conclusions: Lack of consistency in tumor definition is a critical barrier for radiation treatment targeting as well as for response assessment in clinical trials and in clinical oncology decision-making. The properties of the authors approach make it well suited for applications in image-guided radiation oncology, response assessment, or treatment outcome prediction.« less
Bootstrap Methods: A Very Leisurely Look.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Winstead, Wayland H.
The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…
Saad, Ahmed S; Attia, Ali K; Alaraki, Manal S; Elzanfaly, Eman S
2015-11-05
Five different spectrophotometric methods were applied for simultaneous determination of fenbendazole and rafoxanide in their binary mixture; namely first derivative, derivative ratio, ratio difference, dual wavelength and H-point standard addition spectrophotometric methods. Different factors affecting each of the applied spectrophotometric methods were studied and the selectivity of the applied methods was compared. The applied methods were validated as per the ICH guidelines and good accuracy; specificity and precision were proven within the concentration range of 5-50 μg/mL for both drugs. Statistical analysis using one-way ANOVA proved no significant differences among the proposed methods for the determination of the two drugs. The proposed methods successfully determined both drugs in laboratory prepared and commercially available binary mixtures, and were found applicable for the routine analysis in quality control laboratories. Copyright © 2015 Elsevier B.V. All rights reserved.
Statistical models and NMR analysis of polymer microstructure
USDA-ARS?s Scientific Manuscript database
Statistical models can be used in conjunction with NMR spectroscopy to study polymer microstructure and polymerization mechanisms. Thus, Bernoullian, Markovian, and enantiomorphic-site models are well known. Many additional models have been formulated over the years for additional situations. Typica...
Pretreatment of paper tube residuals for improved biogas production.
Teghammar, Anna; Yngvesson, Johan; Lundin, Magnus; Taherzadeh, Mohammad J; Horváth, Ilona Sárvári
2010-02-01
Paper tube residuals, which are lignocellulosic wastes, have been studied as substrate for biogas (methane) production. Steam explosion and nonexplosive hydrothermal pretreatment, in combination with sodium hydroxide and/or hydrogen peroxide, have been used to improve the biogas production. The treatment conditions of temperature, time and addition of NaOH and H(2)O(2) were statistically evaluated for methane production. Explosive pretreatment was more successful than the nonexplosive method, and gave the best results at 220 degrees C, 10 min, with addition of both 2% NaOH and 2% H(2)O(2). Digestion of the pretreated materials at these conditions yielded 493 N ml/g VS methane which was 107% more than the untreated materials. In addition, the initial digestion rate was improved by 132% compared to the untreated samples. The addition of NaOH was, besides the explosion effect, the most important factor to improve the biogas production.
Effects of cigarette tax on cigarette consumption and the Chinese economy
Hu, T; Mao, Z
2002-01-01
Objectives: To analyse a policy dilemma in China on public health versus the tobacco economy through additional cigarette tax. Methods: Using published statistics from 1980 through 1997 to estimate the impact of tobacco production and consumption on government revenue and the entire economy. These estimates relied on the results of estimated price elasticities of the demand for cigarettes in China. Results: Given the estimated price elasticities (-0.54), by introducing an additional 10% increase in cigarette tax per pack (from the current 40% to 50% tax rate), the central government tax revenue would twice exceed total losses in industry revenue, tobacco farmers' income, and local tax revenue. In addition, between 1.44 and 2.16 million lives would be saved by this tax increase. Conclusions: Additional taxation on cigarettes in China would be a desirable public policy for the Chinese government to consider. PMID:12035000
LaBudde, Robert A; Harnly, James M
2012-01-01
A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
Statistical methods for nuclear material management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowen W.M.; Bennett, C.A.
1988-12-01
This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material managementmore » problems.« less
A NEW LOG EVALUATION METHOD TO APPRAISE MESAVERDE RE-COMPLETION OPPORTUNITIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Albert Greer
2003-09-11
Artificial intelligence tools, fuzzy logic and neural networks were used to evaluate the potential of the behind pipe Mesaverde formation in BMG's Mancos formation wells. A fractal geostatistical mapping algorithm was also used to predict Mesaverde production. Additionally, a conventional geological study was conducted. To date one Mesaverde completion has been performed. The Janet No.3 Mesaverde completion was non-economic. Both the AI method and the geostatistical methods predicted the failure of the Janet No.3. The Gavilan No.1 in the Mesaverde was completed during the course of the study and was an extremely good well. This well was not included inmore » the statistical dataset. The AI method predicted very good production while the fractal map predicted a poor producer.« less
NASA Astrophysics Data System (ADS)
Schuckers, Michael E.; Hawley, Anne; Livingstone, Katie; Mramba, Nona
2004-08-01
Confidence intervals are an important way to assess and estimate a parameter. In the case of biometric identification devices, several approaches to confidence intervals for an error rate have been proposed. Here we evaluate six of these methods. To complete this evaluation, we simulate data from a wide variety of parameter values. This data are simulated via a correlated binary distribution. We then determine how well these methods do at what they say they do: capturing the parameter inside the confidence interval. In addition, the average widths of the various confidence intervals are recorded for each set of parameters. The complete results of this simulation are presented graphically for easy comparison. We conclude by making a recommendation regarding which method performs best.
Quality of reporting statistics in two Indian pharmacology journals.
Jaykaran; Yadav, Preeti
2011-04-01
To evaluate the reporting of the statistical methods in articles published in two Indian pharmacology journals. All original articles published since 2002 were downloaded from the journals' (Indian Journal of Pharmacology (IJP) and Indian Journal of Physiology and Pharmacology (IJPP)) website. These articles were evaluated on the basis of appropriateness of descriptive statistics and inferential statistics. Descriptive statistics was evaluated on the basis of reporting of method of description and central tendencies. Inferential statistics was evaluated on the basis of fulfilling of assumption of statistical methods and appropriateness of statistical tests. Values are described as frequencies, percentage, and 95% confidence interval (CI) around the percentages. Inappropriate descriptive statistics was observed in 150 (78.1%, 95% CI 71.7-83.3%) articles. Most common reason for this inappropriate descriptive statistics was use of mean ± SEM at the place of "mean (SD)" or "mean ± SD." Most common statistical method used was one-way ANOVA (58.4%). Information regarding checking of assumption of statistical test was mentioned in only two articles. Inappropriate statistical test was observed in 61 (31.7%, 95% CI 25.6-38.6%) articles. Most common reason for inappropriate statistical test was the use of two group test for three or more groups. Articles published in two Indian pharmacology journals are not devoid of statistical errors.
ERIC Educational Resources Information Center
Hahs-Vaughn, Debbie L.; Acquaye, Hannah; Griffith, Matthew D.; Jo, Hang; Matthews, Ken; Acharya, Parul
2017-01-01
Statistical literacy refers to understanding fundamental statistical concepts. Assessment of statistical literacy can take the forms of tasks that require students to identify, translate, compute, read, and interpret data. In addition, statistical instruction can take many forms encompassing course delivery format such as face-to-face, hybrid,…
Simple Statistical Model to Quantify Maximum Expected EMC in Spacecraft and Avionics Boxes
NASA Technical Reports Server (NTRS)
Trout, Dawn H.; Bremner, Paul
2014-01-01
This study shows cumulative distribution function (CDF) comparisons of composite a fairing electromagnetic field data obtained by computational electromagnetic 3D full wave modeling and laboratory testing. Test and model data correlation is shown. In addition, this presentation shows application of the power balance and extention of this method to predict the variance and maximum exptected mean of the E-field data. This is valuable for large scale evaluations of transmission inside cavities.
United States Air Force Graduate Student Research Program. Program Technical rept. Volume 3
1988-12-01
placing scalp electrodes in a bipolar arrangement. An additional problem we faced in the F4 study was time- loc ::ing the ERP to the eliciting...and Leo Jehl of OEHL for their help with halide and metal analysis that was invalubae to this project. John Barnaby, Will Robinson, Chris Ritter, and...a patient teacher of statistical methods. Dr Eric D. Grassman and Dr Leo Spaccavento explained many technical cardiological points and were generally
Zheng, Mingguo; Chen, Xiaoan
2015-01-01
Correlation analysis is popular in erosion- or earth-related studies, however, few studies compare correlations on a basis of statistical testing, which should be conducted to determine the statistical significance of the observed sample difference. This study aims to statistically determine the erosivity index of single storms, which requires comparison of a large number of dependent correlations between rainfall-runoff factors and soil loss, in the Chinese Loess Plateau. Data observed at four gauging stations and five runoff experimental plots were presented. Based on the Meng’s tests, which is widely used for comparing correlations between a dependent variable and a set of independent variables, two methods were proposed. The first method removes factors that are poorly correlated with soil loss from consideration in a stepwise way, while the second method performs pairwise comparisons that are adjusted using the Bonferroni correction. Among 12 rainfall factors, I 30 (the maximum 30-minute rainfall intensity) has been suggested for use as the rainfall erosivity index, although I 30 is equally correlated with soil loss as factors of I 20, EI 10 (the product of the rainfall kinetic energy, E, and I 10), EI 20 and EI 30 are. Runoff depth (total runoff volume normalized to drainage area) is more correlated with soil loss than all other examined rainfall-runoff factors, including I 30, peak discharge and many combined factors. Moreover, sediment concentrations of major sediment-producing events are independent of all examined rainfall-runoff factors. As a result, introducing additional factors adds little to the prediction accuracy of the single factor of runoff depth. Hence, runoff depth should be the best erosivity index at scales from plots to watersheds. Our findings can facilitate predictions of soil erosion in the Loess Plateau. Our methods provide a valuable tool while determining the predictor among a number of variables in terms of correlations. PMID:25781173
Zheng, Mingguo; Chen, Xiaoan
2015-01-01
Correlation analysis is popular in erosion- or earth-related studies, however, few studies compare correlations on a basis of statistical testing, which should be conducted to determine the statistical significance of the observed sample difference. This study aims to statistically determine the erosivity index of single storms, which requires comparison of a large number of dependent correlations between rainfall-runoff factors and soil loss, in the Chinese Loess Plateau. Data observed at four gauging stations and five runoff experimental plots were presented. Based on the Meng's tests, which is widely used for comparing correlations between a dependent variable and a set of independent variables, two methods were proposed. The first method removes factors that are poorly correlated with soil loss from consideration in a stepwise way, while the second method performs pairwise comparisons that are adjusted using the Bonferroni correction. Among 12 rainfall factors, I30 (the maximum 30-minute rainfall intensity) has been suggested for use as the rainfall erosivity index, although I30 is equally correlated with soil loss as factors of I20, EI10 (the product of the rainfall kinetic energy, E, and I10), EI20 and EI30 are. Runoff depth (total runoff volume normalized to drainage area) is more correlated with soil loss than all other examined rainfall-runoff factors, including I30, peak discharge and many combined factors. Moreover, sediment concentrations of major sediment-producing events are independent of all examined rainfall-runoff factors. As a result, introducing additional factors adds little to the prediction accuracy of the single factor of runoff depth. Hence, runoff depth should be the best erosivity index at scales from plots to watersheds. Our findings can facilitate predictions of soil erosion in the Loess Plateau. Our methods provide a valuable tool while determining the predictor among a number of variables in terms of correlations.
NASA Astrophysics Data System (ADS)
Salvato, Steven Walter
The purpose of this study was to analyze questions within the chapters of a nontraditional general chemistry textbook and the four general chemistry textbooks most widely used by Texas community colleges in order to determine if the questions require higher- or lower-order thinking according to Bloom's taxonomy. The study employed quantitative methods. Bloom's taxonomy (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956) was utilized as the main instrument in the study. Additional tools were used to help classify the questions into the proper category of the taxonomy (McBeath, 1992; Metfessel, Michael, & Kirsner, 1969). The top four general chemistry textbooks used in Texas community colleges and Chemistry: A Project of the American Chemical Society (Bell et al., 2005) were analyzed during the fall semester of 2010 in order to categorize the questions within the chapters into one of the six levels of Bloom's taxonomy. Two coders were used to assess reliability. The data were analyzed using descriptive and inferential methods. The descriptive method involved calculation of the frequencies and percentages of coded questions from the books as belonging to the six categories of the taxonomy. Questions were dichotomized into higher- and lower-order thinking questions. The inferential methods involved chi-square tests of association to determine if there were statistically significant differences among the four traditional college general chemistry textbooks in the proportions of higher- and lower-order questions and if there were statistically significant differences between the nontraditional chemistry textbook and the four traditional general chemistry textbooks. Findings indicated statistically significant differences among the four textbooks frequently used in Texas community colleges in the number of higher- and lower-level questions. Statistically significant differences were also found among the four textbooks and the nontraditional textbook. After the analysis of the data, conclusions were drawn, implications for practice were delineated, and recommendations for future research were given.
Religiousness and preoperative anxiety: a correlational study
Aghamohammadi Kalkhoran, Masoomeh; Karimollahi, Mansoureh
2007-01-01
Background Major life changes are among factors that cause anxiety, and one of these changes is surgery. Emotional reactions to surgery have specific effects on the intensity and velocity as well as the process of physical disease. In addition, they can cause delay in patients recovery. This study is aimed at determining the relationship between religious beliefs and preoperative anxiety. Methods This survey is a correlational study to assess the relationship between religious beliefs and preoperative anxiety of patients undergoing abdominal, orthopaedic, and gynaecologic surgery in educational hospitals. We used the convenience sampling method. The data collection instruments included a questionnaire containing the Spielberger State-Trait Anxiety Inventory (STAI), and another questionnaire formulated by the researcher with queries on religious beliefs and demographic characteristics as well as disease-related information. Analysis of the data was carried out with SPSS software using descriptive and inferential statistics. Results were arranged in three tables. Results The findings showed that almost all the subjects had high level of religiosity and moderate level of anxiety. In addition, there was an inverse relationship between religiosity and intensity of anxiety, though this was not statistically significant. Conclusion The results of this study can be used as evidence for presenting religious counselling and spiritual interventions for individuals undergoing stress. Finally, based on the results of this study, the researcher suggested some recommendations for applying results and conducting further research. PMID:17603897
Hu, Xiangdong; Liu, Yujiang; Qian, Linxue
2017-10-01
Real-time elastography (RTE) and shear wave elastography (SWE) are noninvasive and easily available imaging techniques that measure the tissue strain, and it has been reported that the sensitivity and the specificity of elastography were better in differentiating between benign and malignant thyroid nodules than conventional technologies. Relevant articles were searched in multiple databases; the comparison of elasticity index (EI) was conducted with the Review Manager 5.0. Forest plots of the sensitivity and specificity and SROC curve of RTE and SWE were performed with STATA 10.0 software. In addition, sensitivity analysis and bias analysis of the studies were conducted to examine the quality of articles; and to estimate possible publication bias, funnel plot was used and the Egger test was conducted. Finally 22 articles which eventually satisfied the inclusion criteria were included in this study. After eliminating the inefficient, benign and malignant nodules were 2106 and 613, respectively. The meta-analysis suggested that the difference of EI between benign and malignant nodules was statistically significant (SMD = 2.11, 95% CI [1.67, 2.55], P < .00001). The overall sensitivities of RTE and SWE were roughly comparable, whereas the difference of specificities between these 2 methods was statistically significant. In addition, statistically significant difference of AUC between RTE and SWE was observed between RTE and SWE (P < .01). The specificity of RTE was statistically higher than that of SWE; which suggests that compared with SWE, RTE may be more accurate on differentiating benign and malignant thyroid nodules.
NASA Astrophysics Data System (ADS)
Farmer, W. H.; Archfield, S. A.; Over, T. M.; Kiang, J. E.
2015-12-01
In the United States and across the globe, the majority of stream reaches and rivers are substantially impacted by water use or remain ungaged. The result is large gaps in the availability of natural streamflow records from which to infer hydrologic understanding and inform water resources management. From basin-specific to continent-wide scales, many efforts have been undertaken to develop methods to estimate ungaged streamflow. This work applies and contrasts several statistical models of daily streamflow to more than 1,700 reference-quality streamgages across the conterminous United States using a cross-validation methodology. The variability of streamflow simulation performance across the country exhibits a pattern familiar to other continental scale modeling efforts performed for the United States. For portions of the West Coast and the dense, relatively homogeneous and humid regions of the eastern United States models produce reliable estimates of daily streamflow using many different prediction methods. Model performance for the middle portion of the United States, marked by more heterogeneous and arid conditions, and with larger contributing areas and sparser networks of streamgages, is consistently poor. A discussion of the difficulty of statistical interpolation and regionalization in these regions raises additional questions of data availability and quality, hydrologic process representation and dominance, and intrinsic variability.
Hoskinson, Christine; McCain, Stephanie; Allender, Matthew C
2014-01-01
Body temperature readings can be a useful diagnostic tool for identifying the presence of subclinical disease. Traditionally, rectal or cloacal thermometry has been used to obtain body temperatures. The use of implantable microchips to obtain these temperatures has been studied in a variety of animals, but not yet in avian species. Initially, timepoint one (T₁), nine lorikeets were anesthetized via facemask induction with 5% isoflurane and maintained at 2-3% for microchip placement and body temperature data collection. Body temperature was measured at 0 and 2 min post-anesthetic induction both cloacally, using a Cardell veterinary monitor and also via implantable microchip, utilizing a universal scanner. On two more occasions, timepoints two and three (T₂, T₃), the same nine lorikeets were manually restrained to obtain body temperature readings both cloacally and via microchip, again at minutes 0 and 2. There was no statistical difference between body temperatures, for both methods, at T₁. Microchip temperatures were statistically different than cloacal temperatures at T₂ and T₃. Body temperatures at T₁, were statistically different from those obtained at T₂ and T₃ for both methods. Additional studies are warranted to verify the accuracy of microchip core body temperature readings in avian species. © 2014 Wiley Periodicals, Inc.
Liu, Jen-Pei; Lu, Li-Tien; Liao, C T
2009-09-01
Intermediate precision is one of the most important characteristics for evaluation of precision in assay validation. The current methods for evaluation of within-device precision recommended by the Clinical Laboratory Standard Institute (CLSI) guideline EP5-A2 are based on the point estimator. On the other hand, in addition to point estimators, confidence intervals can provide a range for the within-device precision with a probability statement. Therefore, we suggest a confidence interval approach for assessment of the within-device precision. Furthermore, under the two-stage nested random-effects model recommended by the approved CLSI guideline EP5-A2, in addition to the current Satterthwaite's approximation and the modified large sample (MLS) methods, we apply the technique of generalized pivotal quantities (GPQ) to derive the confidence interval for the within-device precision. The data from the approved CLSI guideline EP5-A2 illustrate the applications of the confidence interval approach and comparison of results between the three methods. Results of a simulation study on the coverage probability and expected length of the three methods are reported. The proposed method of the GPQ-based confidence intervals is also extended to consider the between-laboratories variation for precision assessment.