Sample records for statistical estimation methods

  1. STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA

    EPA Science Inventory

    This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...

  2. The estimation of the measurement results with using statistical methods

    NASA Astrophysics Data System (ADS)

    Velychko, O.; Gordiyenko, T.

    2015-02-01

    The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.

  3. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  4. Evaluation of Fuzzy-Logic Framework for Spatial Statistics Preserving Methods for Estimation of Missing Precipitation Data

    NASA Astrophysics Data System (ADS)

    El Sharif, H.; Teegavarapu, R. S.

    2012-12-01

    Spatial interpolation methods used for estimation of missing precipitation data at a site seldom check for their ability to preserve site and regional statistics. Such statistics are primarily defined by spatial correlations and other site-to-site statistics in a region. Preservation of site and regional statistics represents a means of assessing the validity of missing precipitation estimates at a site. This study evaluates the efficacy of a fuzzy-logic methodology for infilling missing historical daily precipitation data in preserving site and regional statistics. Rain gauge sites in the state of Kentucky, USA, are used as a case study for evaluation of this newly proposed method in comparison to traditional data infilling techniques. Several error and performance measures will be used to evaluate the methods and trade-offs in accuracy of estimation and preservation of site and regional statistics.

  5. Estimation of selected streamflow statistics for a network of low-flow partial-record stations in areas affected by Base Realignment and Closure (BRAC) in Maryland

    USGS Publications Warehouse

    Ries, Kernell G.; Eng, Ken

    2010-01-01

    The U.S. Geological Survey, in cooperation with the Maryland Department of the Environment, operated a network of 20 low-flow partial-record stations during 2008 in a region that extends from southwest of Baltimore to the northeastern corner of Maryland to obtain estimates of selected streamflow statistics at the station locations. The study area is expected to face a substantial influx of new residents and businesses as a result of military and civilian personnel transfers associated with the Federal Base Realignment and Closure Act of 2005. The estimated streamflow statistics, which include monthly 85-percent duration flows, the 10-year recurrence-interval minimum base flow, and the 7-day, 10-year low flow, are needed to provide a better understanding of the availability of water resources in the area to be affected by base-realignment activities. Streamflow measurements collected for this study at the low-flow partial-record stations and measurements collected previously for 8 of the 20 stations were related to concurrent daily flows at nearby index streamgages to estimate the streamflow statistics. Three methods were used to estimate the streamflow statistics and two methods were used to select the index streamgages. Of the three methods used to estimate the streamflow statistics, two of them--the Moments and MOVE1 methods--rely on correlating the streamflow measurements at the low-flow partial-record stations with concurrent streamflows at nearby, hydrologically similar index streamgages to determine the estimates. These methods, recommended for use by the U.S. Geological Survey, generally require about 10 streamflow measurements at the low-flow partial-record station. The third method transfers the streamflow statistics from the index streamgage to the partial-record station based on the average of the ratios of the measured streamflows at the partial-record station to the concurrent streamflows at the index streamgage. This method can be used with as few as one pair of streamflow measurements made on a single streamflow recession at the low-flow partial-record station, although additional pairs of measurements will increase the accuracy of the estimates. Errors associated with the two correlation methods generally were lower than the errors associated with the flow-ratio method, but the advantages of the flow-ratio method are that it can produce reasonably accurate estimates from streamflow measurements much faster and at lower cost than estimates obtained using the correlation methods. The two index-streamgage selection methods were (1) selection based on the highest correlation coefficient between the low-flow partial-record station and the index streamgages, and (2) selection based on Euclidean distance, where the Euclidean distance was computed as a function of geographic proximity and the basin characteristics: drainage area, percentage of forested area, percentage of impervious area, and the base-flow recession time constant, t. Method 1 generally selected index streamgages that were significantly closer to the low-flow partial-record stations than method 2. The errors associated with the estimated streamflow statistics generally were lower for method 1 than for method 2, but the differences were not statistically significant. The flow-ratio method for estimating streamflow statistics at low-flow partial-record stations was shown to be independent from the two correlation-based estimation methods. As a result, final estimates were determined for eight low-flow partial-record stations by weighting estimates from the flow-ratio method with estimates from one of the two correlation methods according to the respective variances of the estimates. Average standard errors of estimate for the final estimates ranged from 90.0 to 7.0 percent, with an average value of 26.5 percent. Average standard errors of estimate for the weighted estimates were, on average, 4.3 percent less than the best average standard errors of estima

  6. An adaptive state of charge estimation approach for lithium-ion series-connected battery system

    NASA Astrophysics Data System (ADS)

    Peng, Simin; Zhu, Xuelai; Xing, Yinjiao; Shi, Hongbing; Cai, Xu; Pecht, Michael

    2018-07-01

    Due to the incorrect or unknown noise statistics of a battery system and its cell-to-cell variations, state of charge (SOC) estimation of a lithium-ion series-connected battery system is usually inaccurate or even divergent using model-based methods, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). To resolve this problem, an adaptive unscented Kalman filter (AUKF) based on a noise statistics estimator and a model parameter regulator is developed to accurately estimate the SOC of a series-connected battery system. An equivalent circuit model is first built based on the model parameter regulator that illustrates the influence of cell-to-cell variation on the battery system. A noise statistics estimator is then used to attain adaptively the estimated noise statistics for the AUKF when its prior noise statistics are not accurate or exactly Gaussian. The accuracy and effectiveness of the SOC estimation method is validated by comparing the developed AUKF and UKF when model and measurement statistics noises are inaccurate, respectively. Compared with the UKF and EKF, the developed method shows the highest SOC estimation accuracy.

  7. Transmission overhaul and replacement predictions using Weibull and renewel theory

    NASA Technical Reports Server (NTRS)

    Savage, M.; Lewicki, D. G.

    1989-01-01

    A method to estimate the frequency of transmission overhauls is presented. This method is based on the two-parameter Weibull statistical distribution for component life. A second method is presented to estimate the number of replacement components needed to support the transmission overhaul pattern. The second method is based on renewal theory. Confidence statistics are applied with both methods to improve the statistical estimate of sample behavior. A transmission example is also presented to illustrate the use of the methods. Transmission overhaul frequency and component replacement calculations are included in the example.

  8. [Regression on order statistics and its application in estimating nondetects for food exposure assessment].

    PubMed

    Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

    2009-01-01

    To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.

  9. A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

    PubMed

    Zhou, Xiang

    2017-12-01

    Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.

  10. Consistency of extreme flood estimation approaches

    NASA Astrophysics Data System (ADS)

    Felder, Guido; Paquet, Emmanuel; Penot, David; Zischg, Andreas; Weingartner, Rolf

    2017-04-01

    Estimations of low-probability flood events are frequently used for the planning of infrastructure as well as for determining the dimensions of flood protection measures. There are several well-established methodical procedures to estimate low-probability floods. However, a global assessment of the consistency of these methods is difficult to achieve, the "true value" of an extreme flood being not observable. Anyway, a detailed comparison performed on a given case study brings useful information about the statistical and hydrological processes involved in different methods. In this study, the following three different approaches for estimating low-probability floods are compared: a purely statistical approach (ordinary extreme value statistics), a statistical approach based on stochastic rainfall-runoff simulation (SCHADEX method), and a deterministic approach (physically based PMF estimation). These methods are tested for two different Swiss catchments. The results and some intermediate variables are used for assessing potential strengths and weaknesses of each method, as well as for evaluating the consistency of these methods.

  11. Statistical methods for estimating normal blood chemistry ranges and variance in rainbow trout (Salmo gairdneri), Shasta Strain

    USGS Publications Warehouse

    Wedemeyer, Gary A.; Nelson, Nancy C.

    1975-01-01

    Gaussian and nonparametric (percentile estimate and tolerance interval) statistical methods were used to estimate normal ranges for blood chemistry (bicarbonate, bilirubin, calcium, hematocrit, hemoglobin, magnesium, mean cell hemoglobin concentration, osmolality, inorganic phosphorus, and pH for juvenile rainbow (Salmo gairdneri, Shasta strain) trout held under defined environmental conditions. The percentile estimate and Gaussian methods gave similar normal ranges, whereas the tolerance interval method gave consistently wider ranges for all blood variables except hemoglobin. If the underlying frequency distribution is unknown, the percentile estimate procedure would be the method of choice.

  12. Methods for estimating low-flow statistics for Massachusetts streams

    USGS Publications Warehouse

    Ries, Kernell G.; Friesz, Paul J.

    2000-01-01

    Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The streamgaging stations had from 2 to 81 years of record, with a mean record length of 37 years. The low-flow partial-record stations had from 8 to 36 streamflow measurements, with a median of 14 measurements. All basin characteristics were determined from digital map data. The basin characteristics that were statistically significant in most of the final regression equations were drainage area, the area of stratified-drift deposits per unit of stream length plus 0.1, mean basin slope, and an indicator variable that was 0 in the eastern region and 1 in the western region of Massachusetts. The equations were developed by use of weighted-least-squares regression analyses, with weights assigned proportional to the years of record and inversely proportional to the variances of the streamflow statistics for the stations. Standard errors of prediction ranged from 70.7 to 17.5 percent for the equations to predict the 7-day, 10-year low flow and 50-percent duration flow, respectively. The equations are not applicable for use in the Southeast Coastal region of the State, or where basin characteristics for the selected ungaged site are outside the ranges of those for the stations used in the regression analyses. A World Wide Web application was developed that provides streamflow statistics for data collection stations from a data base and for ungaged sites by measuring the necessary basin characteristics for the site and solving the regression equations. Output provided by the Web application for ungaged sites includes a map of the drainage-basin boundary determined for the site, the measured basin characteristics, the estimated streamflow statistics, and 90-percent prediction intervals for the estimates. An equation is provided for combining regression and correlation estimates to obtain improved estimates of the streamflow statistics for low-flow partial-record stations. An equation is also provided for combining regression and drainage-area ratio estimates to obtain improved e

  13. Estimating the proportion of true null hypotheses when the statistics are discrete.

    PubMed

    Dialsingh, Isaac; Austin, Stefanie R; Altman, Naomi S

    2015-07-15

    In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. implemented in R. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective.

    PubMed

    Kruschke, John K; Liddell, Torrin M

    2018-02-01

    In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

  15. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    PubMed

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  16. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  17. Statistical plant set estimation using Schroeder-phased multisinusoidal input design

    NASA Technical Reports Server (NTRS)

    Bayard, D. S.

    1992-01-01

    A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.

  18. Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches

    PubMed Central

    Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer

    2014-01-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  19. Comparison between deterministic and statistical wavelet estimation methods through predictive deconvolution: Seismic to well tie example from the North Sea

    NASA Astrophysics Data System (ADS)

    de Macedo, Isadora A. S.; da Silva, Carolina B.; de Figueiredo, J. J. S.; Omoboya, Bode

    2017-01-01

    Wavelet estimation as well as seismic-to-well tie procedures are at the core of every seismic interpretation workflow. In this paper we perform a comparative study of wavelet estimation methods for seismic-to-well tie. Two approaches to wavelet estimation are discussed: a deterministic estimation, based on both seismic and well log data, and a statistical estimation, based on predictive deconvolution and the classical assumptions of the convolutional model, which provides a minimum-phase wavelet. Our algorithms, for both wavelet estimation methods introduce a semi-automatic approach to determine the optimum parameters of deterministic wavelet estimation and statistical wavelet estimation and, further, to estimate the optimum seismic wavelets by searching for the highest correlation coefficient between the recorded trace and the synthetic trace, when the time-depth relationship is accurate. Tests with numerical data show some qualitative conclusions, which are probably useful for seismic inversion and interpretation of field data, by comparing deterministic wavelet estimation and statistical wavelet estimation in detail, especially for field data example. The feasibility of this approach is verified on real seismic and well data from Viking Graben field, North Sea, Norway. Our results also show the influence of the washout zones on well log data on the quality of the well to seismic tie.

  20. Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

    PubMed

    Nariai, N; Kim, S; Imoto, S; Miyano, S

    2004-01-01

    We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.

  1. Parameter estimation techniques based on optimizing goodness-of-fit statistics for structural reliability

    NASA Technical Reports Server (NTRS)

    Starlinger, Alois; Duffy, Stephen F.; Palko, Joseph L.

    1993-01-01

    New methods are presented that utilize the optimization of goodness-of-fit statistics in order to estimate Weibull parameters from failure data. It is assumed that the underlying population is characterized by a three-parameter Weibull distribution. Goodness-of-fit tests are based on the empirical distribution function (EDF). The EDF is a step function, calculated using failure data, and represents an approximation of the cumulative distribution function for the underlying population. Statistics (such as the Kolmogorov-Smirnov statistic and the Anderson-Darling statistic) measure the discrepancy between the EDF and the cumulative distribution function (CDF). These statistics are minimized with respect to the three Weibull parameters. Due to nonlinearities encountered in the minimization process, Powell's numerical optimization procedure is applied to obtain the optimum value of the EDF. Numerical examples show the applicability of these new estimation methods. The results are compared to the estimates obtained with Cooper's nonlinear regression algorithm.

  2. Inference on network statistics by restricting to the network space: applications to sexual history data.

    PubMed

    Goyal, Ravi; De Gruttola, Victor

    2018-01-30

    Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  3. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  4. The limits of protein sequence comparison?

    PubMed Central

    Pearson, William R; Sierk, Michael L

    2010-01-01

    Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194

  5. Calculating weighted estimates of peak streamflow statistics

    USGS Publications Warehouse

    Cohn, Timothy A.; Berenbrock, Charles; Kiang, Julie E.; Mason, Jr., Robert R.

    2012-01-01

    According to the Federal guidelines for flood-frequency estimation, the uncertainty of peak streamflow statistics, such as the 1-percent annual exceedance probability (AEP) flow at a streamgage, can be reduced by combining the at-site estimate with the regional regression estimate to obtain a weighted estimate of the flow statistic. The procedure assumes the estimates are independent, which is reasonable in most practical situations. The purpose of this publication is to describe and make available a method for calculating a weighted estimate from the uncertainty or variance of the two independent estimates.

  6. Bootstrap Methods: A Very Leisurely Look.

    ERIC Educational Resources Information Center

    Hinkle, Dennis E.; Winstead, Wayland H.

    The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…

  7. Improving statistical inference on pathogen densities estimated by quantitative molecular methods: malaria gametocytaemia as a case study.

    PubMed

    Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S

    2015-01-16

    Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.

  8. Uncertainties in Estimates of Fleet Average Fuel Economy : A Statistical Evaluation

    DOT National Transportation Integrated Search

    1977-01-01

    Research was performed to assess the current Federal procedure for estimating the average fuel economy of each automobile manufacturer's new car fleet. Test vehicle selection and fuel economy estimation methods were characterized statistically and so...

  9. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  10. A Monte Carlo Study of the Effect of Item Characteristic Curve Estimation on the Accuracy of Three Person-Fit Statistics

    ERIC Educational Resources Information Center

    St-Onge, Christina; Valois, Pierre; Abdous, Belkacem; Germain, Stephane

    2009-01-01

    To date, there have been no studies comparing parametric and nonparametric Item Characteristic Curve (ICC) estimation methods on the effectiveness of Person-Fit Statistics (PFS). The primary aim of this study was to determine if the use of ICCs estimated by nonparametric methods would increase the accuracy of item response theory-based PFS for…

  11. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.

    PubMed

    Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun

    2017-09-21

    The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.

  12. Correcting for Optimistic Prediction in Small Data Sets

    PubMed Central

    Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.

    2014-01-01

    The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  14. Estimation of absorption rate constant (ka) following oral administration by Wagner-Nelson, Loo-Riegelman, and statistical moments in the presence of a secondary peak.

    PubMed

    Mahmood, Iftekhar

    2004-01-01

    The objective of this study was to evaluate the performance of Wagner-Nelson, Loo-Reigelman, and statistical moments methods in determining the absorption rate constant(s) in the presence of a secondary peak. These methods were also evaluated when there were two absorption rates without a secondary peak. Different sets of plasma concentration versus time data for a hypothetical drug following one or two compartment models were generated by simulation. The true ka was compared with the ka estimated by Wagner-Nelson, Loo-Riegelman and statistical moments methods. The results of this study indicate that Wagner-Nelson, Loo-Riegelman and statistical moments methods may not be used for the estimation of absorption rate constants in the presence of a secondary peak or when absorption takes place with two absorption rates.

  15. Statistical study of generalized nonlinear phase step estimation methods in phase-shifting interferometry.

    PubMed

    Langoju, Rajesh; Patil, Abhijit; Rastogi, Pramod

    2007-11-20

    Signal processing methods based on maximum-likelihood theory, discrete chirp Fourier transform, and spectral estimation methods have enabled accurate measurement of phase in phase-shifting interferometry in the presence of nonlinear response of the piezoelectric transducer to the applied voltage. We present the statistical study of these generalized nonlinear phase step estimation methods to identify the best method by deriving the Cramér-Rao bound. We also address important aspects of these methods for implementation in practical applications and compare the performance of the best-identified method with other bench marking algorithms in the presence of harmonics and noise.

  16. Adaptive Error Estimation in Linearized Ocean General Circulation Models

    NASA Technical Reports Server (NTRS)

    Chechelnitsky, Michael Y.

    1999-01-01

    Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.

  17. Fractal analysis of the short time series in a visibility graph method

    NASA Astrophysics Data System (ADS)

    Li, Ruixue; Wang, Jiang; Yu, Haitao; Deng, Bin; Wei, Xile; Chen, Yingyuan

    2016-05-01

    The aim of this study is to evaluate the performance of the visibility graph (VG) method on short fractal time series. In this paper, the time series of Fractional Brownian motions (fBm), characterized by different Hurst exponent H, are simulated and then mapped into a scale-free visibility graph, of which the degree distributions show the power-law form. The maximum likelihood estimation (MLE) is applied to estimate power-law indexes of degree distribution, and in this progress, the Kolmogorov-Smirnov (KS) statistic is used to test the performance of estimation of power-law index, aiming to avoid the influence of droop head and heavy tail in degree distribution. As a result, we find that the MLE gives an optimal estimation of power-law index when KS statistic reaches its first local minimum. Based on the results from KS statistic, the relationship between the power-law index and the Hurst exponent is reexamined and then amended to meet short time series. Thus, a method combining VG, MLE and KS statistics is proposed to estimate Hurst exponents from short time series. Lastly, this paper also offers an exemplification to verify the effectiveness of the combined method. In addition, the corresponding results show that the VG can provide a reliable estimation of Hurst exponents.

  18. Standard and goodness-of-fit parameter estimation methods for the three-parameter lognormal distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kane, V.E.

    1982-01-01

    A class of goodness-of-fit estimators is found to provide a useful alternative in certain situations to the standard maximum likelihood method which has some undesirable estimation characteristics for estimation from the three-parameter lognormal distribution. The class of goodness-of-fit tests considered include the Shapiro-Wilk and Filliben tests which reduce to a weighted linear combination of the order statistics that can be maximized in estimation problems. The weighted order statistic estimators are compared to the standard procedures in Monte Carlo simulations. Robustness of the procedures are examined and example data sets analyzed.

  19. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  20. Simulation-based estimation of mean and standard deviation for meta-analysis via Approximate Bayesian Computation (ABC).

    PubMed

    Kwon, Deukwoo; Reis, Isildinha M

    2015-08-12

    When conducting a meta-analysis of a continuous outcome, estimated means and standard deviations from the selected studies are required in order to obtain an overall estimate of the mean effect and its confidence interval. If these quantities are not directly reported in the publications, they must be estimated from other reported summary statistics, such as the median, the minimum, the maximum, and quartiles. We propose a simulation-based estimation approach using the Approximate Bayesian Computation (ABC) technique for estimating mean and standard deviation based on various sets of summary statistics found in published studies. We conduct a simulation study to compare the proposed ABC method with the existing methods of Hozo et al. (2005), Bland (2015), and Wan et al. (2014). In the estimation of the standard deviation, our ABC method performs better than the other methods when data are generated from skewed or heavy-tailed distributions. The corresponding average relative error (ARE) approaches zero as sample size increases. In data generated from the normal distribution, our ABC performs well. However, the Wan et al. method is best for estimating standard deviation under normal distribution. In the estimation of the mean, our ABC method is best regardless of assumed distribution. ABC is a flexible method for estimating the study-specific mean and standard deviation for meta-analysis, especially with underlying skewed or heavy-tailed distributions. The ABC method can be applied using other reported summary statistics such as the posterior mean and 95 % credible interval when Bayesian analysis has been employed.

  1. Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

    NASA Astrophysics Data System (ADS)

    Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

    2016-10-01

    Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.

  2. Ill Posed Problems: Numerical and Statistical Methods for Mildly, Moderately and Severely Ill Posed Problems with Noisy Data.

    DTIC Science & Technology

    1980-02-01

    to estimate f -..ell, -noderately ,-ell, or- poorly. 1 ’The sansitivity *of a rec-ilarized estimate of f to the noise is made explicit. After giving the...AD-A 7 .SA92 925 WISCONSIN UN! V-MADISON DEFT OF STATISTICS F /S 11,’ 1 ILL POSED PRORLEMS: NUMERICAL ANn STATISTICAL METHODS FOR MILOL-ETC(U FEB 80 a...estimate f given z. We first define the 1 intrinsic rank of the problem where jK(tit) f (t)dt is known exactly. This 0 definition is used to provide insight

  3. Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values

    PubMed Central

    Tong, Tiejun; Feng, Zeny; Hilton, Julia S.; Zhao, Hongyu

    2013-01-01

    Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 − λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. PMID:24078762

  4. Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values.

    PubMed

    Tong, Tiejun; Feng, Zeny; Hilton, Julia S; Zhao, Hongyu

    2013-01-01

    Estimating the proportion of true null hypotheses, π 0 , has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π 0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π 0 by incorporating the distribution pattern of the observed p -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.

  5. Nowcasting Cloud Fields for U.S. Air Force Special Operations

    DTIC Science & Technology

    2017-03-01

    application of Bayes’ Rule offers many advantages over Kernel Density Estimation (KDE) and other commonly used statistical post-processing methods...reflectance and probability of cloud. A statistical post-processing technique is applied using Bayesian estimation to train the system from a set of past...nowcasting, low cloud forecasting, cloud reflectance, ISR, Bayesian estimation, statistical post-processing, machine learning 15. NUMBER OF PAGES

  6. Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.

    PubMed

    Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I

    2018-06-26

    The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.

  7. Mortality table construction

    NASA Astrophysics Data System (ADS)

    Sutawanir

    2015-12-01

    Mortality tables play important role in actuarial studies such as life annuities, premium determination, premium reserve, valuation pension plan, pension funding. Some known mortality tables are CSO mortality table, Indonesian Mortality Table, Bowers mortality table, Japan Mortality table. For actuary applications some tables are constructed with different environment such as single decrement, double decrement, and multiple decrement. There exist two approaches in mortality table construction : mathematics approach and statistical approach. Distribution model and estimation theory are the statistical concepts that are used in mortality table construction. This article aims to discuss the statistical approach in mortality table construction. The distributional assumptions are uniform death distribution (UDD) and constant force (exponential). Moment estimation and maximum likelihood are used to estimate the mortality parameter. Moment estimation methods are easier to manipulate compared to maximum likelihood estimation (mle). However, the complete mortality data are not used in moment estimation method. Maximum likelihood exploited all available information in mortality estimation. Some mle equations are complicated and solved using numerical methods. The article focus on single decrement estimation using moment and maximum likelihood estimation. Some extension to double decrement will introduced. Simple dataset will be used to illustrated the mortality estimation, and mortality table.

  8. [Research & development on computer expert system for forensic bones estimation].

    PubMed

    Zhao, Jun-ji; Zhang, Jan-zheng; Liu, Nin-guo

    2005-08-01

    To build an expert system for forensic bones estimation. By using the object oriented method, employing statistical data of forensic anthropology, combining the statistical data frame knowledge representation with productions and also using the fuzzy matching and DS evidence theory method. Software for forensic estimation of sex, age and height with opened knowledge base was designed. This system is reliable and effective, and it would be a good assistant of the forensic technician.

  9. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  10. A smoothed residual based goodness-of-fit statistic for nest-survival models

    Treesearch

    Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell

    2008-01-01

    Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...

  11. Statistical power in parallel group point exposure studies with time-to-event outcomes: an empirical comparison of the performance of randomized controlled trials and the inverse probability of treatment weighting (IPTW) approach.

    PubMed

    Austin, Peter C; Schuster, Tibor; Platt, Robert W

    2015-10-15

    Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.

  12. New U.S. Geological Survey Method for the Assessment of Reserve Growth

    USGS Publications Warehouse

    Klett, Timothy R.; Attanasi, E.D.; Charpentier, Ronald R.; Cook, Troy A.; Freeman, P.A.; Gautier, Donald L.; Le, Phuong A.; Ryder, Robert T.; Schenk, Christopher J.; Tennyson, Marilyn E.; Verma, Mahendra K.

    2011-01-01

    Reserve growth is defined as the estimated increases in quantities of crude oil, natural gas, and natural gas liquids that have the potential to be added to remaining reserves in discovered accumulations through extension, revision, improved recovery efficiency, and additions of new pools or reservoirs. A new U.S. Geological Survey method was developed to assess the reserve-growth potential of technically recoverable crude oil and natural gas to be added to reserves under proven technology currently in practice within the trend or play, or which reasonably can be extrapolated from geologically similar trends or plays. This method currently is in use to assess potential additions to reserves in discovered fields of the United States. The new approach involves (1) individual analysis of selected large accumulations that contribute most to reserve growth, and (2) conventional statistical modeling of reserve growth in remaining accumulations. This report will focus on the individual accumulation analysis. In the past, the U.S. Geological Survey estimated reserve growth by statistical methods using historical recoverable-quantity data. Those statistical methods were based on growth rates averaged by the number of years since accumulation discovery. Accumulations in mature petroleum provinces with volumetrically significant reserve growth, however, bias statistical models of the data; therefore, accumulations with significant reserve growth are best analyzed separately from those with less significant reserve growth. Large (greater than 500 million barrels) and older (with respect to year of discovery) oil accumulations increase in size at greater rates late in their development history in contrast to more recently discovered accumulations that achieve most growth early in their development history. Such differences greatly affect the statistical methods commonly used to forecast reserve growth. The individual accumulation-analysis method involves estimating the in-place petroleum quantity and its uncertainty, as well as the estimated (forecasted) recoverability and its respective uncertainty. These variables are assigned probabilistic distributions and are combined statistically to provide probabilistic estimates of ultimate recoverable quantities. Cumulative production and remaining reserves are then subtracted from the estimated ultimate recoverable quantities to provide potential reserve growth. In practice, results of the two methods are aggregated to various scales, the highest of which includes an entire country or the world total. The aggregated results are reported along with the statistically appropriate uncertainties.

  13. Individualized statistical learning from medical image databases: application to identification of brain lesions.

    PubMed

    Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos

    2014-04-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  15. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  16. Evaluation of Methods Used for Estimating Selected Streamflow Statistics, and Flood Frequency and Magnitude, for Small Basins in North Coastal California

    USGS Publications Warehouse

    Mann, Michael P.; Rizzardo, Jule; Satkowski, Richard

    2004-01-01

    Accurate streamflow statistics are essential to water resource agencies involved in both science and decision-making. When long-term streamflow data are lacking at a site, estimation techniques are often employed to generate streamflow statistics. However, procedures for accurately estimating streamflow statistics often are lacking. When estimation procedures are developed, they often are not evaluated properly before being applied. Use of unevaluated or underevaluated flow-statistic estimation techniques can result in improper water-resources decision-making. The California State Water Resources Control Board (SWRCB) uses two key techniques, a modified rational equation and drainage basin area-ratio transfer, to estimate streamflow statistics at ungaged locations. These techniques have been implemented to varying degrees, but have not been formally evaluated. For estimating peak flows at the 2-, 5-, 10-, 25-, 50-, and 100-year recurrence intervals, the SWRCB uses the U.S. Geological Surveys (USGS) regional peak-flow equations. In this study, done cooperatively by the USGS and SWRCB, the SWRCB estimated several flow statistics at 40 USGS streamflow gaging stations in the north coast region of California. The SWRCB estimates were made without reference to USGS flow data. The USGS used the streamflow data provided by the 40 stations to generate flow statistics that could be compared with SWRCB estimates for accuracy. While some SWRCB estimates compared favorably with USGS statistics, results were subject to varying degrees of error over the region. Flow-based estimation techniques generally performed better than rain-based methods, especially for estimation of December 15 to March 31 mean daily flows. The USGS peak-flow equations also performed well, but tended to underestimate peak flows. The USGS equations performed within reported error bounds, but will require updating in the future as peak-flow data sets grow larger. Little correlation was discovered between estimation errors and geographic locations or various basin characteristics. However, for 25-percentile year mean-daily-flow estimates for December 15 to March 31, the greatest estimation errors were at east San Francisco Bay area stations with mean annual precipitation less than or equal to 30 inches, and estimated 2-year/24-hour rainfall intensity less than 3 inches.

  17. Estimation versus falsification approaches in sport and exercise science.

    PubMed

    Wilkinson, Michael; Winter, Edward M

    2018-05-22

    There has been a recent resurgence in debate about methods for statistical inference in science. The debate addresses statistical concepts and their impact on the value and meaning of analyses' outcomes. In contrast, philosophical underpinnings of approaches and the extent to which analytical tools match philosophical goals of the scientific method have received less attention. This short piece considers application of the scientific method to "what-is-the-influence-of x-on-y" type questions characteristic of sport and exercise science. We consider applications and interpretations of estimation versus falsification based statistical approaches and their value in addressing how much x influences y, and in measurement error and method agreement settings. We compare estimation using magnitude based inference (MBI) with falsification using null hypothesis significance testing (NHST), and highlight the limited value both of falsification and NHST to address problems in sport and exercise science. We recommend adopting an estimation approach, expressing the uncertainty of effects of x on y, and their practical/clinical value against pre-determined effect magnitudes using MBI.

  18. Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

    PubMed Central

    Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

    2015-01-01

    Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830

  19. Statistical inference for remote sensing-based estimates of net deforestation

    Treesearch

    Ronald E. McRoberts; Brian F. Walters

    2012-01-01

    Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...

  20. AN EMPIRICAL BAYES APPROACH TO COMBINING ESTIMATES OF THE VALUE OF A STATISTICAL LIFE FOR ENVIRONMENTAL POLICY ANALYSIS

    EPA Science Inventory

    This analysis updates EPA's standard VSL estimate by using a more comprehensive collection of VSL studies that include studies published between 1992 and 2000, as well as applying a more appropriate statistical method. We provide a pooled effect VSL estimate by applying the empi...

  1. Multi-level methods and approximating distribution functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilson, D., E-mail: daniel.wilson@dtc.ox.ac.uk; Baker, R. E.

    2016-07-15

    Biochemical reaction networks are often modelled using discrete-state, continuous-time Markov chains. System statistics of these Markov chains usually cannot be calculated analytically and therefore estimates must be generated via simulation techniques. There is a well documented class of simulation techniques known as exact stochastic simulation algorithms, an example of which is Gillespie’s direct method. These algorithms often come with high computational costs, therefore approximate stochastic simulation algorithms such as the tau-leap method are used. However, in order to minimise the bias in the estimates generated using them, a relatively small value of tau is needed, rendering the computational costs comparablemore » to Gillespie’s direct method. The multi-level Monte Carlo method (Anderson and Higham, Multiscale Model. Simul. 10:146–179, 2012) provides a reduction in computational costs whilst minimising or even eliminating the bias in the estimates of system statistics. This is achieved by first crudely approximating required statistics with many sample paths of low accuracy. Then correction terms are added until a required level of accuracy is reached. Recent literature has primarily focussed on implementing the multi-level method efficiently to estimate a single system statistic. However, it is clearly also of interest to be able to approximate entire probability distributions of species counts. We present two novel methods that combine known techniques for distribution reconstruction with the multi-level method. We demonstrate the potential of our methods using a number of examples.« less

  2. Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

    ERIC Educational Resources Information Center

    Ali, Usama S.; Walker, Michael E.

    2014-01-01

    Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…

  3. Empirical best linear unbiased prediction method for small areas with restricted maximum likelihood and bootstrap procedure to estimate the average of household expenditure per capita in Banjar Regency

    NASA Astrophysics Data System (ADS)

    Aminah, Agustin Siti; Pawitan, Gandhi; Tantular, Bertho

    2017-03-01

    So far, most of the data published by Statistics Indonesia (BPS) as data providers for national statistics are still limited to the district level. Less sufficient sample size for smaller area levels to make the measurement of poverty indicators with direct estimation produced high standard error. Therefore, the analysis based on it is unreliable. To solve this problem, the estimation method which can provide a better accuracy by combining survey data and other auxiliary data is required. One method often used for the estimation is the Small Area Estimation (SAE). There are many methods used in SAE, one of them is Empirical Best Linear Unbiased Prediction (EBLUP). EBLUP method of maximum likelihood (ML) procedures does not consider the loss of degrees of freedom due to estimating β with β ^. This drawback motivates the use of the restricted maximum likelihood (REML) procedure. This paper proposed EBLUP with REML procedure for estimating poverty indicators by modeling the average of household expenditures per capita and implemented bootstrap procedure to calculate MSE (Mean Square Error) to compare the accuracy EBLUP method with the direct estimation method. Results show that EBLUP method reduced MSE in small area estimation.

  4. Nonparametric entropy estimation using kernel densities.

    PubMed

    Lake, Douglas E

    2009-01-01

    The entropy of experimental data from the biological and medical sciences provides additional information over summary statistics. Calculating entropy involves estimates of probability density functions, which can be effectively accomplished using kernel density methods. Kernel density estimation has been widely studied and a univariate implementation is readily available in MATLAB. The traditional definition of Shannon entropy is part of a larger family of statistics, called Renyi entropy, which are useful in applications that require a measure of the Gaussianity of data. Of particular note is the quadratic entropy which is related to the Friedman-Tukey (FT) index, a widely used measure in the statistical community. One application where quadratic entropy is very useful is the detection of abnormal cardiac rhythms, such as atrial fibrillation (AF). Asymptotic and exact small-sample results for optimal bandwidth and kernel selection to estimate the FT index are presented and lead to improved methods for entropy estimation.

  5. Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: a systematic review.

    PubMed

    Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C

    2018-03-07

    Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.

  6. Estimation of Lithological Classification in Taipei Basin: A Bayesian Maximum Entropy Method

    NASA Astrophysics Data System (ADS)

    Wu, Meng-Ting; Lin, Yuan-Chien; Yu, Hwa-Lung

    2015-04-01

    In environmental or other scientific applications, we must have a certain understanding of geological lithological composition. Because of restrictions of real conditions, only limited amount of data can be acquired. To find out the lithological distribution in the study area, many spatial statistical methods used to estimate the lithological composition on unsampled points or grids. This study applied the Bayesian Maximum Entropy (BME method), which is an emerging method of the geological spatiotemporal statistics field. The BME method can identify the spatiotemporal correlation of the data, and combine not only the hard data but the soft data to improve estimation. The data of lithological classification is discrete categorical data. Therefore, this research applied Categorical BME to establish a complete three-dimensional Lithological estimation model. Apply the limited hard data from the cores and the soft data generated from the geological dating data and the virtual wells to estimate the three-dimensional lithological classification in Taipei Basin. Keywords: Categorical Bayesian Maximum Entropy method, Lithological Classification, Hydrogeological Setting

  7. Evaluation of assumptions in soil moisture triple collocation analysis

    USDA-ARS?s Scientific Manuscript database

    Triple collocation analysis (TCA) enables estimation of error variances for three or more products that retrieve or estimate the same geophysical variable using mutually-independent methods. Several statistical assumptions regarding the statistical nature of errors (e.g., mutual independence and ort...

  8. Estimation of distributional parameters for censored trace level water quality data: 2. Verification and applications

    USGS Publications Warehouse

    Helsel, Dennis R.; Gilliom, Robert J.

    1986-01-01

    Estimates of distributional parameters (mean, standard deviation, median, interquartile range) are often desired for data sets containing censored observations. Eight methods for estimating these parameters have been evaluated by R. J. Gilliom and D. R. Helsel (this issue) using Monte Carlo simulations. To verify those findings, the same methods are now applied to actual water quality data. The best method (lowest root-mean-squared error (rmse)) over all parameters, sample sizes, and censoring levels is log probability regression (LR), the method found best in the Monte Carlo simulations. Best methods for estimating moment or percentile parameters separately are also identical to the simulations. Reliability of these estimates can be expressed as confidence intervals using rmse and bias values taken from the simulation results. Finally, a new simulation study shows that best methods for estimating uncensored sample statistics from censored data sets are identical to those for estimating population parameters. Thus this study and the companion study by Gilliom and Helsel form the basis for making the best possible estimates of either population parameters or sample statistics from censored water quality data, and for assessments of their reliability.

  9. Biennial Survey of Education in the United States, 1934-1936. Bulletin, 1937, No. 2. Volume II. Chapter III: Statistics of City School Systems, 1935-36

    ERIC Educational Resources Information Center

    Herlihy, Lester B.; Deffenbaugh, Walter S.

    1938-01-01

    This report presents statistics of city school systems for the school year 1935-36. prior to 1933-34 school statistics for cities included in county unit systems were estimated. Most of these cities are in Florida, Louisiana, Maryland, and West Virginia. Since the method of estimating school statistics for the cities included with the counties in…

  10. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    USGS Publications Warehouse

    Southard, Rodney E.

    2013-01-01

    The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations. The limits for the use of these equations are based on the ranges of the characteristics used as independent variables and that streams must be affected minimally by anthropogenic activities.

  11. Output power fluctuations due to different weights of macro particles used in particle-in-cell simulations of Cerenkov devices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bao, Rong; Li, Yongdong; Liu, Chunliang

    2016-07-15

    The output power fluctuations caused by weights of macro particles used in particle-in-cell (PIC) simulations of a backward wave oscillator and a travelling wave tube are statistically analyzed. It is found that the velocities of electrons passed a specific slow-wave structure form a specific electron velocity distribution. The electron velocity distribution obtained in PIC simulation with a relative small weight of macro particles is considered as an initial distribution. By analyzing this initial distribution with a statistical method, the estimations of the output power fluctuations caused by different weights of macro particles are obtained. The statistical method is verified bymore » comparing the estimations with the simulation results. The fluctuations become stronger with increasing weight of macro particles, which can also be determined reversely from estimations of the output power fluctuations. With the weights of macro particles optimized by the statistical method, the output power fluctuations in PIC simulations are relatively small and acceptable.« less

  12. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  13. New methods of testing nonlinear hypothesis using iterative NLLS estimator

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper discusses the method of testing nonlinear hypothesis using iterative Nonlinear Least Squares (NLLS) estimator. Takeshi Amemiya [1] explained this method. However in the present research paper, a modified Wald test statistic due to Engle, Robert [6] is proposed to test the nonlinear hypothesis using iterative NLLS estimator. An alternative method for testing nonlinear hypothesis using iterative NLLS estimator based on nonlinear hypothesis using iterative NLLS estimator based on nonlinear studentized residuals has been proposed. In this research article an innovative method of testing nonlinear hypothesis using iterative restricted NLLS estimator is derived. Pesaran and Deaton [10] explained the methods of testing nonlinear hypothesis. This paper uses asymptotic properties of nonlinear least squares estimator proposed by Jenrich [8]. The main purpose of this paper is to provide very innovative methods of testing nonlinear hypothesis using iterative NLLS estimator, iterative NLLS estimator based on nonlinear studentized residuals and iterative restricted NLLS estimator. Eakambaram et al. [12] discussed least absolute deviation estimations versus nonlinear regression model with heteroscedastic errors and also they studied the problem of heteroscedasticity with reference to nonlinear regression models with suitable illustration. William Grene [13] examined the interaction effect in nonlinear models disused by Ai and Norton [14] and suggested ways to examine the effects that do not involve statistical testing. Peter [15] provided guidelines for identifying composite hypothesis and addressing the probability of false rejection for multiple hypotheses.

  14. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  15. The suitability of using death certificates as a data source for cancer mortality assessment in Turkey

    PubMed Central

    Ulus, Tumer; Yurtseven, Eray; Cavdar, Sabanur; Erginoz, Ethem; Erdogan, M. Sarper

    2012-01-01

    Aim To compare the quality of the 2008 cancer mortality data of the Istanbul Directorate of Cemeteries (IDC) with the 2008 data of International Agency for Research on Cancer (IARC) and Turkish Statistical Institute (TUIK), and discuss the suitability of using this databank for estimations of cancer mortality in the future. Methods We used 2008 and 2010 death records of the IDC and compared it to TUIK and IARC data. Results According to the WHO statistics, in Turkey in 2008 there were 67 255 estimated cancer deaths. As the population of Turkey was 71 517 100, the cancer mortality rate was 9.4 per 10 000. According to the IDC statistics, the cancer mortality rate in Istanbul in 2008 was 5.97 per 10 000. Conclusion IDC estimates were higher than WHO estimates probably because WHO bases its estimates on a sample group and because of the restrictions of IDC data collection method. Death certificates could be a reliable and accurate data source for mortality statistics if the problems of data collection are solved. PMID:23100210

  16. Individualized Statistical Learning from Medical Image Databases: Application to Identification of Brain Lesions

    PubMed Central

    Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos

    2014-01-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564

  17. Multivariate statistical approach to estimate mixing proportions for unknown end members

    USGS Publications Warehouse

    Valder, Joshua F.; Long, Andrew J.; Davis, Arden D.; Kenner, Scott J.

    2012-01-01

    A multivariate statistical method is presented, which includes principal components analysis (PCA) and an end-member mixing model to estimate unknown end-member hydrochemical compositions and the relative mixing proportions of those end members in mixed waters. PCA, together with the Hotelling T2 statistic and a conceptual model of groundwater flow and mixing, was used in selecting samples that best approximate end members, which then were used as initial values in optimization of the end-member mixing model. This method was tested on controlled datasets (i.e., true values of estimates were known a priori) and found effective in estimating these end members and mixing proportions. The controlled datasets included synthetically generated hydrochemical data, synthetically generated mixing proportions, and laboratory analyses of sample mixtures, which were used in an evaluation of the effectiveness of this method for potential use in actual hydrological settings. For three different scenarios tested, correlation coefficients (R2) for linear regression between the estimated and known values ranged from 0.968 to 0.993 for mixing proportions and from 0.839 to 0.998 for end-member compositions. The method also was applied to field data from a study of end-member mixing in groundwater as a field example and partial method validation.

  18. Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

    PubMed

    Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

    2015-03-01

    Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  19. A REVIEW OF STATISTICAL METHODS FOR THE METEOROLOGICAL ADJUSTMENT OF TROPOSPHERIC OZONE

    EPA Science Inventory

    A variety of statistical methods for meteorological adjustment of ozone have been proposed in the literature over the last decade for purposes of forecasting, estimating ozone time trends, or investigating underlying mechanisms from an empirical perspective. The methods can be...

  20. State Estimation Using Dependent Evidence Fusion: Application to Acoustic Resonance-Based Liquid Level Measurement.

    PubMed

    Xu, Xiaobin; Li, Zhenghui; Li, Guo; Zhou, Zhe

    2017-04-21

    Estimating the state of a dynamic system via noisy sensor measurement is a common problem in sensor methods and applications. Most state estimation methods assume that measurement noise and state perturbations can be modeled as random variables with known statistical properties. However in some practical applications, engineers can only get the range of noises, instead of the precise statistical distributions. Hence, in the framework of Dempster-Shafer (DS) evidence theory, a novel state estimatation method by fusing dependent evidence generated from state equation, observation equation and the actual observations of the system states considering bounded noises is presented. It can be iteratively implemented to provide state estimation values calculated from fusion results at every time step. Finally, the proposed method is applied to a low-frequency acoustic resonance level gauge to obtain high-accuracy measurement results.

  1. Pattern statistics on Markov chains and sensitivity to parameter estimation

    PubMed Central

    Nuel, Grégory

    2006-01-01

    Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916

  2. Pattern statistics on Markov chains and sensitivity to parameter estimation.

    PubMed

    Nuel, Grégory

    2006-10-17

    In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.

  3. Method and system for efficient video compression with low-complexity encoder

    NASA Technical Reports Server (NTRS)

    Chen, Jun (Inventor); He, Dake (Inventor); Sheinin, Vadim (Inventor); Jagmohan, Ashish (Inventor); Lu, Ligang (Inventor)

    2012-01-01

    Disclosed are a method and system for video compression, wherein the video encoder has low computational complexity and high compression efficiency. The disclosed system comprises a video encoder and a video decoder, wherein the method for encoding includes the steps of converting a source frame into a space-frequency representation; estimating conditional statistics of at least one vector of space-frequency coefficients; estimating encoding rates based on the said conditional statistics; and applying Slepian-Wolf codes with the said computed encoding rates. The preferred method for decoding includes the steps of; generating a side-information vector of frequency coefficients based on previously decoded source data, encoder statistics, and previous reconstructions of the source frequency vector; and performing Slepian-Wolf decoding of at least one source frequency vector based on the generated side-information, the Slepian-Wolf code bits and the encoder statistics.

  4. ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.

    PubMed

    Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka

    2015-01-01

    Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.

  5. Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

    PubMed

    Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

    2012-01-01

    Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. Estimating survival probabilities by exposure levels: utilizing vital statistics and complex survey data with mortality follow-up.

    PubMed

    Landsman, V; Lou, W Y W; Graubard, B I

    2015-05-20

    We present a two-step approach for estimating hazard rates and, consequently, survival probabilities, by levels of general categorical exposure. The resulting estimator utilizes three sources of data: vital statistics data and census data are used at the first step to estimate the overall hazard rate for a given combination of gender and age group, and cohort data constructed from a nationally representative complex survey with linked mortality records, are used at the second step to divide the overall hazard rate by exposure levels. We present an explicit expression for the resulting estimator and consider two methods for variance estimation that account for complex multistage sample design: (1) the leaving-one-out jackknife method, and (2) the Taylor linearization method, which provides an analytic formula for the variance estimator. The methods are illustrated with smoking and all-cause mortality data from the US National Health Interview Survey Linked Mortality Files, and the proposed estimator is compared with a previously studied crude hazard rate estimator that uses survey data only. The advantages of a two-step approach and possible extensions of the proposed estimator are discussed. Copyright © 2015 John Wiley & Sons, Ltd.

  7. A Statistical Method of Evaluating the Pronunciation Proficiency/Intelligibility of English Presentations by Japanese Speakers

    ERIC Educational Resources Information Center

    Kibishi, Hiroshi; Hirabayashi, Kuniaki; Nakagawa, Seiichi

    2015-01-01

    In this paper, we propose a statistical evaluation method of pronunciation proficiency and intelligibility for presentations made in English by native Japanese speakers. We statistically analyzed the actual utterances of speakers to find combinations of acoustic and linguistic features with high correlation between the scores estimated by the…

  8. Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application.

    PubMed

    Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D

    2016-06-15

    The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Representation of Probability Density Functions from Orbit Determination using the Particle Filter

    NASA Technical Reports Server (NTRS)

    Mashiku, Alinda K.; Garrison, James; Carpenter, J. Russell

    2012-01-01

    Statistical orbit determination enables us to obtain estimates of the state and the statistical information of its region of uncertainty. In order to obtain an accurate representation of the probability density function (PDF) that incorporates higher order statistical information, we propose the use of nonlinear estimation methods such as the Particle Filter. The Particle Filter (PF) is capable of providing a PDF representation of the state estimates whose accuracy is dependent on the number of particles or samples used. For this method to be applicable to real case scenarios, we need a way of accurately representing the PDF in a compressed manner with little information loss. Hence we propose using the Independent Component Analysis (ICA) as a non-Gaussian dimensional reduction method that is capable of maintaining higher order statistical information obtained using the PF. Methods such as the Principal Component Analysis (PCA) are based on utilizing up to second order statistics, hence will not suffice in maintaining maximum information content. Both the PCA and the ICA are applied to two scenarios that involve a highly eccentric orbit with a lower apriori uncertainty covariance and a less eccentric orbit with a higher a priori uncertainty covariance, to illustrate the capability of the ICA in relation to the PCA.

  10. Estimating weak ratiometric signals in imaging data. II. Meta-analysis with multiple, dual-channel datasets.

    PubMed

    Sornborger, Andrew; Broder, Josef; Majumder, Anirban; Srinivasamoorthy, Ganesh; Porter, Erika; Reagin, Sean S; Keith, Charles; Lauderdale, James D

    2008-09-01

    Ratiometric fluorescent indicators are used for making quantitative measurements of a variety of physiological variables. Their utility is often limited by noise. This is the second in a series of papers describing statistical methods for denoising ratiometric data with the aim of obtaining improved quantitative estimates of variables of interest. Here, we outline a statistical optimization method that is designed for the analysis of ratiometric imaging data in which multiple measurements have been taken of systems responding to the same stimulation protocol. This method takes advantage of correlated information across multiple datasets for objectively detecting and estimating ratiometric signals. We demonstrate our method by showing results of its application on multiple, ratiometric calcium imaging experiments.

  11. A robust bayesian estimate of the concordance correlation coefficient.

    PubMed

    Feng, Dai; Baumgartner, Richard; Svetnik, Vladimir

    2015-01-01

    A need for assessment of agreement arises in many situations including statistical biomarker qualification or assay or method validation. Concordance correlation coefficient (CCC) is one of the most popular scaled indices reported in evaluation of agreement. Robust methods for CCC estimation currently present an important statistical challenge. Here, we propose a novel Bayesian method of robust estimation of CCC based on multivariate Student's t-distribution and compare it with its alternatives. Furthermore, we extend the method to practically relevant settings, enabling incorporation of confounding covariates and replications. The superiority of the new approach is demonstrated using simulation as well as real datasets from biomarker application in electroencephalography (EEG). This biomarker is relevant in neuroscience for development of treatments for insomnia.

  12. Statistical methods for thermonuclear reaction rates and nucleosynthesis simulations

    NASA Astrophysics Data System (ADS)

    Iliadis, Christian; Longland, Richard; Coc, Alain; Timmes, F. X.; Champagne, Art E.

    2015-03-01

    Rigorous statistical methods for estimating thermonuclear reaction rates and nucleosynthesis are becoming increasingly established in nuclear astrophysics. The main challenge being faced is that experimental reaction rates are highly complex quantities derived from a multitude of different measured nuclear parameters (e.g., astrophysical S-factors, resonance energies and strengths, particle and γ-ray partial widths). We discuss the application of the Monte Carlo method to two distinct, but related, questions. First, given a set of measured nuclear parameters, how can one best estimate the resulting thermonuclear reaction rates and associated uncertainties? Second, given a set of appropriate reaction rates, how can one best estimate the abundances from nucleosynthesis (i.e., reaction network) calculations? The techniques described here provide probability density functions that can be used to derive statistically meaningful reaction rates and final abundances for any desired coverage probability. Examples are given for applications to s-process neutron sources, core-collapse supernovae, classical novae, and Big Bang nucleosynthesis.

  13. An Introduction to Confidence Intervals for Both Statistical Estimates and Effect Sizes.

    ERIC Educational Resources Information Center

    Capraro, Mary Margaret

    This paper summarizes methods of estimating confidence intervals, including classical intervals and intervals for effect sizes. The recent American Psychological Association (APA) Task Force on Statistical Inference report suggested that confidence intervals should always be reported, and the fifth edition of the APA "Publication Manual"…

  14. An evaluation of flow-stratified sampling for estimating suspended sediment loads

    Treesearch

    Robert B. Thomas; Jack Lewis

    1995-01-01

    Abstract - Flow-stratified sampling is a new method for sampling water quality constituents such as suspended sediment to estimate loads. As with selection-at-list-time (SALT) and time-stratified sampling, flow-stratified sampling is a statistical method requiring random sampling, and yielding unbiased estimates of load and variance. It can be used to estimate event...

  15. Peaks Over Threshold (POT): A methodology for automatic threshold estimation using goodness of fit p-value

    NASA Astrophysics Data System (ADS)

    Solari, Sebastián.; Egüen, Marta; Polo, María. José; Losada, Miguel A.

    2017-04-01

    Threshold estimation in the Peaks Over Threshold (POT) method and the impact of the estimation method on the calculation of high return period quantiles and their uncertainty (or confidence intervals) are issues that are still unresolved. In the past, methods based on goodness of fit tests and EDF-statistics have yielded satisfactory results, but their use has not yet been systematized. This paper proposes a methodology for automatic threshold estimation, based on the Anderson-Darling EDF-statistic and goodness of fit test. When combined with bootstrapping techniques, this methodology can be used to quantify both the uncertainty of threshold estimation and its impact on the uncertainty of high return period quantiles. This methodology was applied to several simulated series and to four precipitation/river flow data series. The results obtained confirmed its robustness. For the measured series, the estimated thresholds corresponded to those obtained by nonautomatic methods. Moreover, even though the uncertainty of the threshold estimation was high, this did not have a significant effect on the width of the confidence intervals of high return period quantiles.

  16. The magnitude of variability produced by methods used to estimate annual stormwater contaminant loads for highly urbanised catchments.

    PubMed

    Beck, H J; Birch, G F

    2013-06-01

    Stormwater contaminant loading estimates using event mean concentration (EMC), rainfall/runoff relationship calculations and computer modelling (Model of Urban Stormwater Infrastructure Conceptualisation--MUSIC) demonstrated high variability in common methods of water quality assessment. Predictions of metal, nutrient and total suspended solid loadings for three highly urbanised catchments in Sydney estuary, Australia, varied greatly within and amongst methods tested. EMC and rainfall/runoff relationship calculations produced similar estimates (within 1 SD) in a statistically significant number of trials; however, considerable variability within estimates (∼50 and ∼25 % relative standard deviation, respectively) questions the reliability of these methods. Likewise, upper and lower default inputs in a commonly used loading model (MUSIC) produced an extensive range of loading estimates (3.8-8.3 times above and 2.6-4.1 times below typical default inputs, respectively). Default and calibrated MUSIC simulations produced loading estimates that agreed with EMC and rainfall/runoff calculations in some trials (4-10 from 18); however, they were not frequent enough to statistically infer that these methods produced the same results. Great variance within and amongst mean annual loads estimated by common methods of water quality assessment has important ramifications for water quality managers requiring accurate estimates of the quantities and nature of contaminants requiring treatment.

  17. Bayesian approach to inverse statistical mechanics.

    PubMed

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  18. Bayesian approach to inverse statistical mechanics

    NASA Astrophysics Data System (ADS)

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  19. Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding

    PubMed Central

    Tosteson, Tor D.; Morden, Nancy E.; Stukel, Therese A.; O'Malley, A. James

    2014-01-01

    The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival. PMID:25506259

  20. Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding.

    PubMed

    MacKenzie, Todd A; Tosteson, Tor D; Morden, Nancy E; Stukel, Therese A; O'Malley, A James

    2014-06-01

    The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival.

  1. Investigating the Stability of Four Methods for Estimating Item Bias.

    ERIC Educational Resources Information Center

    Perlman, Carole L.; And Others

    The reliability of item bias estimates was studied for four methods: (1) the transformed delta method; (2) Shepard's modified delta method; (3) Rasch's one-parameter residual analysis; and (4) the Mantel-Haenszel procedure. Bias statistics were computed for each sample using all methods. Data were from administration of multiple-choice items from…

  2. A Comparison of Normal and Elliptical Estimation Methods in Structural Equation Models.

    ERIC Educational Resources Information Center

    Schumacker, Randall E.; Cheevatanarak, Suchittra

    Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean square error of approximation values using normal and elliptical estimation methods. Three research conditions were imposed on the simulated data: sample size, population contamination percent, and kurtosis. A Bentler-Weeks structural model established the…

  3. Comparison of Fatigue Life Estimation Using Equivalent Linearization and Time Domain Simulation Methods

    NASA Technical Reports Server (NTRS)

    Mei, Chuh; Dhainaut, Jean-Michel

    2000-01-01

    The Monte Carlo simulation method in conjunction with the finite element large deflection modal formulation are used to estimate fatigue life of aircraft panels subjected to stationary Gaussian band-limited white-noise excitations. Ten loading cases varying from 106 dB to 160 dB OASPL with bandwidth 1024 Hz are considered. For each load case, response statistics are obtained from an ensemble of 10 response time histories. The finite element nonlinear modal procedure yields time histories, probability density functions (PDF), power spectral densities and higher statistical moments of the maximum deflection and stress/strain. The method of moments of PSD with Dirlik's approach is employed to estimate the panel fatigue life.

  4. A Nonparametric Geostatistical Method For Estimating Species Importance

    Treesearch

    Andrew J. Lister; Rachel Riemann; Michael Hoppus

    2001-01-01

    Parametric statistical methods are not always appropriate for conducting spatial analyses of forest inventory data. Parametric geostatistical methods such as variography and kriging are essentially averaging procedures, and thus can be affected by extreme values. Furthermore, non normal distributions violate the assumptions of analyses in which test statistics are...

  5. Inferring epidemiological dynamics of infectious diseases using Tajima's D statistic on nucleotide sequences of pathogens.

    PubMed

    Kim, Kiyeon; Omori, Ryosuke; Ito, Kimihito

    2017-12-01

    The estimation of the basic reproduction number is essential to understand epidemic dynamics, and time series data of infected individuals are usually used for the estimation. However, such data are not always available. Methods to estimate the basic reproduction number using genealogy constructed from nucleotide sequences of pathogens have been proposed so far. Here, we propose a new method to estimate epidemiological parameters of outbreaks using the time series change of Tajima's D statistic on the nucleotide sequences of pathogens. To relate the time evolution of Tajima's D to the number of infected individuals, we constructed a parsimonious mathematical model describing both the transmission process of pathogens among hosts and the evolutionary process of the pathogens. As a case study we applied this method to the field data of nucleotide sequences of pandemic influenza A (H1N1) 2009 viruses collected in Argentina. The Tajima's D-based method estimated basic reproduction number to be 1.55 with 95% highest posterior density (HPD) between 1.31 and 2.05, and the date of epidemic peak to be 10th July with 95% HPD between 22nd June and 9th August. The estimated basic reproduction number was consistent with estimation by birth-death skyline plot and estimation using the time series of the number of infected individuals. These results suggested that Tajima's D statistic on nucleotide sequences of pathogens could be useful to estimate epidemiological parameters of outbreaks. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Small sample estimation of the reliability function for technical products

    NASA Astrophysics Data System (ADS)

    Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

    2017-12-01

    It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.

  7. Estimating procedure times for surgeries by determining location parameters for the lognormal model.

    PubMed

    Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

    2004-05-01

    We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.

  8. Estimation of diagnostic test accuracy without full verification: a review of latent class methods

    PubMed Central

    Collins, John; Huynh, Minh

    2014-01-01

    The performance of a diagnostic test is best evaluated against a reference test that is without error. For many diseases, this is not possible, and an imperfect reference test must be used. However, diagnostic accuracy estimates may be biased if inaccurately verified status is used as the truth. Statistical models have been developed to handle this situation by treating disease as a latent variable. In this paper, we conduct a systematized review of statistical methods using latent class models for estimating test accuracy and disease prevalence in the absence of complete verification. PMID:24910172

  9. Correction of stream quality trends for the effects of laboratory measurement bias

    USGS Publications Warehouse

    Alexander, Richard B.; Smith, Richard A.; Schwarz, Gregory E.

    1993-01-01

    We present a statistical model relating measurements of water quality to associated errors in laboratory methods. Estimation of the model allows us to correct trends in water quality for long-term and short-term variations in laboratory measurement errors. An illustration of the bias correction method for a large national set of stream water quality and quality assurance data shows that reductions in the bias of estimates of water quality trend slopes are achieved at the expense of increases in the variance of these estimates. Slight improvements occur in the precision of estimates of trend in bias by using correlative information on bias and water quality to estimate random variations in measurement bias. The results of this investigation stress the need for reliable, long-term quality assurance data and efficient statistical methods to assess the effects of measurement errors on the detection of water quality trends.

  10. Multiscale Structure of UXO Site Characterization: Spatial Estimation and Uncertainty Quantification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ostrouchov, George; Doll, William E.; Beard, Les P.

    2009-01-01

    Unexploded ordnance (UXO) site characterization must consider both how the contamination is generated and how we observe that contamination. Within the generation and observation processes, dependence structures can be exploited at multiple scales. We describe a conceptual site characterization process, the dependence structures available at several scales, and consider their statistical estimation aspects. It is evident that most of the statistical methods that are needed to address the estimation problems are known but their application-specific implementation may not be available. We demonstrate estimation at one scale and propose a representation for site contamination intensity that takes full account of uncertainty,more » is flexible enough to answer regulatory requirements, and is a practical tool for managing detailed spatial site characterization and remediation. The representation is based on point process spatial estimation methods that require modern computational resources for practical application. These methods have provisions for including prior and covariate information.« less

  11. Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis.

    PubMed

    Austin, Peter C

    2016-12-30

    Propensity score methods are used to reduce the effects of observed confounding when using observational data to estimate the effects of treatments or exposures. A popular method of using the propensity score is inverse probability of treatment weighting (IPTW). When using this method, a weight is calculated for each subject that is equal to the inverse of the probability of receiving the treatment that was actually received. These weights are then incorporated into the analyses to minimize the effects of observed confounding. Previous research has found that these methods result in unbiased estimation when estimating the effect of treatment on survival outcomes. However, conventional methods of variance estimation were shown to result in biased estimates of standard error. In this study, we conducted an extensive set of Monte Carlo simulations to examine different methods of variance estimation when using a weighted Cox proportional hazards model to estimate the effect of treatment. We considered three variance estimation methods: (i) a naïve model-based variance estimator; (ii) a robust sandwich-type variance estimator; and (iii) a bootstrap variance estimator. We considered estimation of both the average treatment effect and the average treatment effect in the treated. We found that the use of a bootstrap estimator resulted in approximately correct estimates of standard errors and confidence intervals with the correct coverage rates. The other estimators resulted in biased estimates of standard errors and confidence intervals with incorrect coverage rates. Our simulations were informed by a case study examining the effect of statin prescribing on mortality. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  12. Methods and statistics for combining motif match scores.

    PubMed

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  13. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  14. Methods for estimating flow-duration and annual mean-flow statistics for ungaged streams in Oklahoma

    USGS Publications Warehouse

    Esralew, Rachel A.; Smith, S. Jerrod

    2010-01-01

    Flow statistics can be used to provide decision makers with surface-water information needed for activities such as water-supply permitting, flow regulation, and other water rights issues. Flow statistics could be needed at any location along a stream. Most often, streamflow statistics are needed at ungaged sites, where no flow data are available to compute the statistics. Methods are presented in this report for estimating flow-duration and annual mean-flow statistics for ungaged streams in Oklahoma. Flow statistics included the (1) annual (period of record), (2) seasonal (summer-autumn and winter-spring), and (3) 12 monthly duration statistics, including the 20th, 50th, 80th, 90th, and 95th percentile flow exceedances, and the annual mean-flow (mean of daily flows for the period of record). Flow statistics were calculated from daily streamflow information collected from 235 streamflow-gaging stations throughout Oklahoma and areas in adjacent states. A drainage-area ratio method is the preferred method for estimating flow statistics at an ungaged location that is on a stream near a gage. The method generally is reliable only if the drainage-area ratio of the two sites is between 0.5 and 1.5. Regression equations that relate flow statistics to drainage-basin characteristics were developed for the purpose of estimating selected flow-duration and annual mean-flow statistics for ungaged streams that are not near gaging stations on the same stream. Regression equations were developed from flow statistics and drainage-basin characteristics for 113 unregulated gaging stations. Separate regression equations were developed by using U.S. Geological Survey streamflow-gaging stations in regions with similar drainage-basin characteristics. These equations can increase the accuracy of regression equations used for estimating flow-duration and annual mean-flow statistics at ungaged stream locations in Oklahoma. Streamflow-gaging stations were grouped by selected drainage-basin characteristics by using a k-means cluster analysis. Three regions were identified for Oklahoma on the basis of the clustering of gaging stations and a manual delineation of distinguishable hydrologic and geologic boundaries: Region 1 (western Oklahoma excluding the Oklahoma and Texas Panhandles), Region 2 (north- and south-central Oklahoma), and Region 3 (eastern and central Oklahoma). A total of 228 regression equations (225 flow-duration regressions and three annual mean-flow regressions) were developed using ordinary least-squares and left-censored (Tobit) multiple-regression techniques. These equations can be used to estimate 75 flow-duration statistics and annual mean-flow for ungaged streams in the three regions. Drainage-basin characteristics that were statistically significant independent variables in the regression analyses were (1) contributing drainage area; (2) station elevation; (3) mean drainage-basin elevation; (4) channel slope; (5) percentage of forested canopy; (6) mean drainage-basin hillslope; (7) soil permeability; and (8) mean annual, seasonal, and monthly precipitation. The accuracy of flow-duration regression equations generally decreased from high-flow exceedance (low-exceedance probability) to low-flow exceedance (high-exceedance probability) . This decrease may have happened because a greater uncertainty exists for low-flow estimates and low-flow is largely affected by localized geology that was not quantified by the drainage-basin characteristics selected. The standard errors of estimate of regression equations for Region 1 (western Oklahoma) were substantially larger than those standard errors for other regions, especially for low-flow exceedances. These errors may be a result of greater variability in low flow because of increased irrigation activities in this region. Regression equations may not be reliable for sites where the drainage-basin characteristics are outside the range of values of independent vari

  15. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance

    PubMed Central

    2018-01-01

    ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. PMID:29475868

  16. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance.

    PubMed

    Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid

    2018-05-01

    To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. Copyright © 2018 Shakeri et al.

  17. An empirical Bayes method for updating inferences in analysis of quantitative trait loci using information from related genome scans.

    PubMed

    Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B

    2006-08-01

    Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.

  18. Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory

    DTIC Science & Technology

    2016-05-12

    valued times series from a sample. (A practical algorithm to compute the estimator is a work in progress.) Third, finitely-valued spatial processes...ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics; time series ; Markov chains; random...proved. Second, a statistical method is developed to estimate the memory depth of discrete- time and continuously-valued times series from a sample. (A

  19. Comparison of Time-to-First Event and Recurrent Event Methods in Randomized Clinical Trials.

    PubMed

    Claggett, Brian; Pocock, Stuart; Wei, L J; Pfeffer, Marc A; McMurray, John J V; Solomon, Scott D

    2018-03-27

    Background -Most Phase-3 trials feature time-to-first event endpoints for their primary and/or secondary analyses. In chronic diseases where a clinical event can occur more than once, recurrent-event methods have been proposed to more fully capture disease burden and have been assumed to improve statistical precision and power compared to conventional "time-to-first" methods. Methods -To better characterize factors that influence statistical properties of recurrent-events and time-to-first methods in the evaluation of randomized therapy, we repeatedly simulated trials with 1:1 randomization of 4000 patients to active vs control therapy, with true patient-level risk reduction of 20% (i.e. RR=0.80). For patients who discontinued active therapy after a first event, we assumed their risk reverted subsequently to their original placebo-level risk. Through simulation, we varied a) the degree of between-patient heterogeneity of risk and b) the extent of treatment discontinuation. Findings were compared with those from actual randomized clinical trials. Results -As the degree of between-patient heterogeneity of risk was increased, both time-to-first and recurrent-events methods lost statistical power to detect a true risk reduction and confidence intervals widened. The recurrent-events analyses continued to estimate the true RR=0.80 as heterogeneity increased, while the Cox model produced estimates that were attenuated. The power of recurrent-events methods declined as the rate of study drug discontinuation post-event increased. Recurrent-events methods provided greater power than time-to-first methods in scenarios where drug discontinuation was ≤30% following a first event, lesser power with drug discontinuation rates of ≥60%, and comparable power otherwise. We confirmed in several actual trials in chronic heart failure that treatment effect estimates were attenuated when estimated via the Cox model and that increased statistical power from recurrent-events methods was most pronounced in trials with lower treatment discontinuation rates. Conclusions -We find that the statistical power of both recurrent-events and time-to-first methods are reduced by increasing heterogeneity of patient risk, a parameter not included in conventional power and sample size formulas. Data from real clinical trials are consistent with simulation studies, confirming that the greatest statistical gains from use of recurrent-events methods occur in the presence of high patient heterogeneity and low rates of study drug discontinuation.

  20. How Valid are Estimates of Occupational Illness?

    ERIC Educational Resources Information Center

    Hilaski, Harvey J.; Wang, Chao Ling

    1982-01-01

    Examines some of the methods of estimating occupational diseases and suggests that a consensus on the adequacy and reliability of estimates by the Bureau of Labor Statistics and others is not likely. (SK)

  1. Individual snag detection using neighborhood attribute filtered airborne lidar data

    Treesearch

    Brian M. Wing; Martin W. Ritchie; Kevin Boston; Warren B. Cohen; Michael J. Olsen

    2015-01-01

    The ability to estimate and monitor standing dead trees (snags) has been difficult due to their irregular and sparse distribution, often requiring intensive sampling methods to obtain statistically significant estimates. This study presents a new method for estimating and monitoring snags using neighborhood attribute filtered airborne discrete-return lidar data. The...

  2. Estimating Suicide Rates in Developing Nations: A Low-Cost Newspaper Capture-Recapture Approach in Cambodia.

    PubMed

    Harris, Keith M; Thandrayen, Joanne; Samphoas, Chien; Se, Pros; Lewchalermwongse, Boontriga; Ratanashevorn, Rattanakorn; Perry, Megan L; Britts, Choloe

    2016-04-01

    This study tested a low-cost method for estimating suicide rates in developing nations that lack adequate statistics. Data comprised reported suicides from Cambodia's 2 largest newspapers. Capture-recapture modeling estimated a suicide rate of 3.8/100 000 (95% CI = 2.5-6.7) for 2012. That compares to World Health Organization estimates of 1.3 to 9.4/100 000 and a Cambodian government estimate of 3.5/100 000. Suicide rates of males were twice that of females, and rates of those <40 years were twice that of those ≥40 years. Capture-recapture modeling with newspaper reports proved a reasonable method for estimating suicide rates for countries with inadequate official data. These methods are low-cost and can be applied to regions with at least 2 newspapers with overlapping reports. Means to further improve this approach are discussed. These methods are applicable to both recent and historical data, which can benefit epidemiological work, and may also be applicable to homicides and other statistics. © 2016 APJPH.

  3. An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation

    PubMed Central

    Wang, Yikai; Kang, Jian; Kemmer, Phebe B.; Guo, Ying

    2016-01-01

    Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package “DensParcorr” can be downloaded from CRAN for implementing the proposed statistical methods. PMID:27242395

  4. An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation.

    PubMed

    Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying

    2016-01-01

    Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package "DensParcorr" can be downloaded from CRAN for implementing the proposed statistical methods.

  5. Specialized Finite Set Statistics (FISST)-Based Estimation Methods to Enhance Space Situational Awareness in Medium Earth Orbit (MEO) and Geostationary Earth Orbit (GEO)

    DTIC Science & Technology

    2016-08-17

    Research Laboratory AFRL /RVSV Space Vehicles Directorate 3550 Aberdeen Ave, SE 11. SPONSOR/MONITOR’S REPORT Kirtland AFB, NM 87117-5776 NUMBER(S) AFRL -RV...1 cy AFRL /RVIL Kirtland AFB, NM 87117-5776 2 cys Official Record Copy AFRL /RVSV/Richard S. Erwin 1 cy... AFRL -RV-PS- AFRL -RV-PS- TR-2016-0114 TR-2016-0114 SPECIALIZED FINITE SET STATISTICS (FISST)- BASED ESTIMATION METHODS TO ENHANCE SPACE SITUATIONAL

  6. A model and variance reduction method for computing statistical outputs of stochastic elliptic partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk

    We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less

  7. Can statistical linkage of missing variables reduce bias in treatment effect estimates in comparative effectiveness research studies?

    PubMed

    Crown, William; Chang, Jessica; Olson, Melvin; Kahler, Kristijan; Swindle, Jason; Buzinec, Paul; Shah, Nilay; Borah, Bijan

    2015-09-01

    Missing data, particularly missing variables, can create serious analytic challenges in observational comparative effectiveness research studies. Statistical linkage of datasets is a potential method for incorporating missing variables. Prior studies have focused upon the bias introduced by imperfect linkage. This analysis uses a case study of hepatitis C patients to estimate the net effect of statistical linkage on bias, also accounting for the potential reduction in missing variable bias. The results show that statistical linkage can reduce bias while also enabling parameter estimates to be obtained for the formerly missing variables. The usefulness of statistical linkage will vary depending upon the strength of the correlations of the missing variables with the treatment variable, as well as the outcome variable of interest.

  8. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  9. Estimation of body mass index from the metrics of the first metatarsal

    NASA Astrophysics Data System (ADS)

    Dunn, Tyler E.

    Estimation of the biological profile from as many skeletal elements as possible is a necessity in both forensic and bioarchaeological contexts; this includes non-standard aspects of the biological profile, such as body mass index (BMI). BMI is a measure that allows for understanding of the composition of an individual and is traditionally divided into four groups: underweight, normal weight, overweight, and obese. BMI estimation incorporates both estimation of stature and body mass. The estimation of stature from skeletal elements is commonly included into the standard biological profile but the estimation of body mass needs to be further statistically validated to be consistently included. The bones of the foot, specifically the first metatarsal, may have the ability to estimate BMI given an allometric relationship to stature and the mechanical relationship to body mass. There are two commonly used methods for stature estimation, the anatomical method and the regression method. The anatomical method takes into account all of the skeletal elements that contribute to stature while the regression method relies on the allometric relationship between a skeletal element and living stature. A correlation between the metrics of the first metatarsal and living stature has been observed, and proposed as a method for valid stature estimation from the boney foot (Byers et al., 1989). Body mass estimation from skeletal elements relies on two theoretical frameworks: the morphometric and the mechanical approaches. The morphometric approach relies on the size relationship of the individual to body mass; the basic relationship between volume, density, and weight allows for body mass estimation. The body is thought of as a cylinder, and in order to understand the volume of this cylinder the diameter is needed. A commonly used proxy for this in the human body is skeletal bi-iliac breadth from rearticulated pelvic girdle. The mechanical method of body mass estimation relies on the ideas of biomechanical bone remodeling; the elements of the skeleton that are under higher forces, including weight, will remodel to minimize stress. A commonly used metric for the mechanical method of body mass estimation is the diameter of the head of the femur. The foot experiences nearly the entire weight force of the individual at any point in the gait cycle and is subject to the biomechanical remodeling that this force would induce. Therefore, the application of the mechanical framework for body mass estimation could stand true for the elements of the foot. The morphometric and mechanical approaches have been validated against one another on a large, geographically disparate population (Auerbach and Ruff, 2004), but have yet to be validated on a sample of known body mass. DeGroote and Humphrey (2011) test the ability of the first metatarsal to estimate femoral head diameter, body mass, and femoral length. The estimated femoral head diameter from the first metatarsal is used to estimate body mass via the morphometric approach and the femoral length is used to estimate living stature. The authors find that body mass and stature estimation methods from more commonly used skeletal elements compared well with the methods developed from the first metatarsal. This study examines 388 `White' individuals from the William M. Bass donated skeletal collection to test the reliability of the body mass estimates from femoral head diameter and bi-iliac breadth, stature from maximum femoral length, and body mass and stature from the metrics of the first metatarsal. This sample included individuals from all four of the BMI classes. This study finds that all of the skeletal indicators compare well with one another; there is no statistical difference in the stature estimates from the first metatarsal and the maximum length of the femur, and there is no statistical between all three of the body mass estimation methods. When compared to the forensic estimates of stature neither of the tested methods had statistical difference. Conversely, when the body mass estimates are compared to forensic body mass there was a statistical difference and when further investigated the most difference in the body mass estimates was in the extremes of body mass (the underweight and obese categories). These findings indicate that the estimation of stature from both the maximum femoral length and the metrics of the metatarsal are accurate methods. Furthermore, the estimation of body mass is accurate when the individual is in the middle range of the BMI spectrum while these methods for outlying individuals are inaccurate. These findings have implications for the application of stature and body mass estimation in the fields of bioarchaeology, forensic anthropology, and paleoanthropology.

  10. Kappa statistic for clustered matched-pair data.

    PubMed

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  11. Estimation of Global Network Statistics from Incomplete Data

    PubMed Central

    Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan

    2014-01-01

    Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183

  12. Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?

    PubMed

    Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian

    2016-07-22

    Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.

  13. [Flavouring estimation of quality of grape wines with use of methods of mathematical statistics].

    PubMed

    Yakuba, Yu F; Khalaphyan, A A; Temerdashev, Z A; Bessonov, V V; Malinkin, A D

    2016-01-01

    The questions of forming of wine's flavour integral estimation during the tasting are discussed, the advantages and disadvantages of the procedures are declared. As investigating materials we used the natural white and red wines of Russian manufactures, which were made with the traditional technologies from Vitis Vinifera, straight hybrids, blending and experimental wines (more than 300 different samples). The aim of the research was to set the correlation between the content of wine's nonvolatile matter and wine's tasting quality rating by mathematical statistics methods. The content of organic acids, amino acids and cations in wines were considered as the main factors influencing on the flavor. Basically, they define the beverage's quality. The determination of those components in wine's samples was done by the electrophoretic method «CAPEL». Together with the analytical checking of wine's samples quality the representative group of specialists simultaneously carried out wine's tasting estimation using 100 scores system. The possibility of statistical modelling of correlation of wine's tasting estimation based on analytical data of amino acids and cations determination reasonably describing the wine's flavour was examined. The statistical modelling of correlation between the wine's tasting estimation and the content of major cations (ammonium, potassium, sodium, magnesium, calcium), free amino acids (proline, threonine, arginine) and the taking into account the level of influence on flavour and analytical valuation within fixed limits of quality accordance were done with Statistica. Adequate statistical models which are able to predict tasting estimation that is to determine the wine's quality using the content of components forming the flavour properties have been constructed. It is emphasized that along with aromatic (volatile) substances the nonvolatile matter - mineral substances and organic substances - amino acids such as proline, threonine, arginine influence on wine's flavour properties. It has been shown the nonvolatile components contribute in organoleptic and flavour quality estimation of wines as aromatic volatile substances but they take part in forming the expert's evaluation.

  14. Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

    NASA Technical Reports Server (NTRS)

    Hailperin, Max

    1993-01-01

    This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system load ing, resulting in fewer object migration than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive methods.

  15. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

    PubMed

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q

    2016-06-08

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.

  16. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

    PubMed Central

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

    2016-01-01

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519

  17. Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

    PubMed

    Wang, Zuozhen

    2018-01-01

    Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.

  18. Estimation of Return Values of Wave Height: Consequences of Missing Observations

    ERIC Educational Resources Information Center

    Ryden, Jesper

    2008-01-01

    Extreme-value statistics is often used to estimate so-called return values (actually related to quantiles) for environmental quantities like wind speed or wave height. A basic method for estimation is the method of block maxima which consists in partitioning observations in blocks, where maxima from each block could be considered independent.…

  19. BAYESIAN ESTIMATION OF THERMONUCLEAR REACTION RATES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Iliadis, C.; Anderson, K. S.; Coc, A.

    The problem of estimating non-resonant astrophysical S -factors and thermonuclear reaction rates, based on measured nuclear cross sections, is of major interest for nuclear energy generation, neutrino physics, and element synthesis. Many different methods have been applied to this problem in the past, almost all of them based on traditional statistics. Bayesian methods, on the other hand, are now in widespread use in the physical sciences. In astronomy, for example, Bayesian statistics is applied to the observation of extrasolar planets, gravitational waves, and Type Ia supernovae. However, nuclear physics, in particular, has been slow to adopt Bayesian methods. We presentmore » astrophysical S -factors and reaction rates based on Bayesian statistics. We develop a framework that incorporates robust parameter estimation, systematic effects, and non-Gaussian uncertainties in a consistent manner. The method is applied to the reactions d(p, γ ){sup 3}He, {sup 3}He({sup 3}He,2p){sup 4}He, and {sup 3}He( α , γ ){sup 7}Be, important for deuterium burning, solar neutrinos, and Big Bang nucleosynthesis.« less

  20. Solutions to Some Nonlinear Equations from Nonmetric Data.

    ERIC Educational Resources Information Center

    Rule, Stanley J.

    1979-01-01

    A method to provide estimates of parameters of specified nonlinear equations from ordinal data generated from a crossed design is presented. The statistical basis for the method, called NOPE (nonmetric parameter estimation), as well as examples using artifical data, are presented. (Author/JKS)

  1. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  2. DETECTORS AND EXPERIMENTAL METHODS: Heuristic approach for peak regions estimation in gamma-ray spectra measured by a NaI detector

    NASA Astrophysics Data System (ADS)

    Zhu, Meng-Hua; Liu, Liang-Gang; You, Zhong; Xu, Ao-Ao

    2009-03-01

    In this paper, a heuristic approach based on Slavic's peak searching method has been employed to estimate the width of peak regions for background removing. Synthetic and experimental data are used to test this method. With the estimated peak regions using the proposed method in the whole spectrum, we find it is simple and effective enough to be used together with the Statistics-sensitive Nonlinear Iterative Peak-Clipping method.

  3. Weighted analysis methods for mapped plot forest inventory data: Tables, regressions, maps and graphs

    Treesearch

    Paul C. Van Deusen; Linda S. Heath

    2010-01-01

    Weighted estimation methods for analysis of mapped plot forest inventory data are discussed. The appropriate weighting scheme can vary depending on the type of analysis and graphical display. Both statistical issues and user expectations need to be considered in these methods. A weighting scheme is proposed that balances statistical considerations and the logical...

  4. Combined Uncertainty and A-Posteriori Error Bound Estimates for General CFD Calculations: Theory and Software Implementation

    NASA Technical Reports Server (NTRS)

    Barth, Timothy J.

    2014-01-01

    This workshop presentation discusses the design and implementation of numerical methods for the quantification of statistical uncertainty, including a-posteriori error bounds, for output quantities computed using CFD methods. Hydrodynamic realizations often contain numerical error arising from finite-dimensional approximation (e.g. numerical methods using grids, basis functions, particles) and statistical uncertainty arising from incomplete information and/or statistical characterization of model parameters and random fields. The first task at hand is to derive formal error bounds for statistics given realizations containing finite-dimensional numerical error [1]. The error in computed output statistics contains contributions from both realization error and the error resulting from the calculation of statistics integrals using a numerical method. A second task is to devise computable a-posteriori error bounds by numerically approximating all terms arising in the error bound estimates. For the same reason that CFD calculations including error bounds but omitting uncertainty modeling are only of limited value, CFD calculations including uncertainty modeling but omitting error bounds are only of limited value. To gain maximum value from CFD calculations, a general software package for uncertainty quantification with quantified error bounds has been developed at NASA. The package provides implementations for a suite of numerical methods used in uncertainty quantification: Dense tensorization basis methods [3] and a subscale recovery variant [1] for non-smooth data, Sparse tensorization methods[2] utilizing node-nested hierarchies, Sampling methods[4] for high-dimensional random variable spaces.

  5. Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics

    PubMed Central

    2011-01-01

    Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747

  6. On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.

    PubMed

    Westgate, Philip M; Burchett, Woodrow W

    2017-03-15

    The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  7. Appropriateness of the probability approach with a nutrient status biomarker to assess population inadequacy: a study using vitamin D123

    PubMed Central

    Carriquiry, Alicia L; Bailey, Regan L; Sempos, Christopher T; Yetley, Elizabeth A

    2013-01-01

    Background: There are questions about the appropriate method for the accurate estimation of the population prevalence of nutrient inadequacy on the basis of a biomarker of nutrient status (BNS). Objective: We determined the applicability of a statistical probability method to a BNS, specifically serum 25-hydroxyvitamin D [25(OH)D]. The ability to meet required statistical assumptions was the central focus. Design: Data on serum 25(OH)D concentrations in adults aged 19–70 y from the 2005–2006 NHANES were used (n = 3871). An Institute of Medicine report provided reference values. We analyzed key assumptions of symmetry, differences in variance, and the independence of distributions. We also corrected observed distributions for within-person variability (WPV). Estimates of vitamin D inadequacy were determined. Results: We showed that the BNS [serum 25(OH)D] met the criteria to use the method for the estimation of the prevalence of inadequacy. The difference between observations corrected compared with uncorrected for WPV was small for serum 25(OH)D but, nonetheless, showed enhanced accuracy because of correction. The method estimated a 19% prevalence of inadequacy in this sample, whereas misclassification inherent in the use of the more traditional 97.5th percentile high-end cutoff inflated the prevalence of inadequacy (36%). Conclusions: When the prevalence of nutrient inadequacy for a population is estimated by using serum 25(OH)D as an example of a BNS, a statistical probability method is appropriate and more accurate in comparison with a high-end cutoff. Contrary to a common misunderstanding, the method does not overlook segments of the population. The accuracy of population estimates of inadequacy is enhanced by the correction of observed measures for WPV. PMID:23097269

  8. A chance constraint estimation approach to optimizing resource management under uncertainty

    Treesearch

    Michael Bevers

    2007-01-01

    Chance-constrained optimization is an important method for managing risk arising from random variations in natural resource systems, but the probabilistic formulations often pose mathematical programming problems that cannot be solved with exact methods. A heuristic estimation method for these problems is presented that combines a formulation for order statistic...

  9. Improving Non-Destructive Concrete Strength Tests Using Support Vector Machines

    PubMed Central

    Shih, Yi-Fan; Wang, Yu-Ren; Lin, Kuo-Liang; Chen, Chin-Wen

    2015-01-01

    Non-destructive testing (NDT) methods are important alternatives when destructive tests are not feasible to examine the in situ concrete properties without damaging the structure. The rebound hammer test and the ultrasonic pulse velocity test are two popular NDT methods to examine the properties of concrete. The rebound of the hammer depends on the hardness of the test specimen and ultrasonic pulse travelling speed is related to density, uniformity, and homogeneity of the specimen. Both of these two methods have been adopted to estimate the concrete compressive strength. Statistical analysis has been implemented to establish the relationship between hammer rebound values/ultrasonic pulse velocities and concrete compressive strength. However, the estimated results can be unreliable. As a result, this research proposes an Artificial Intelligence model using support vector machines (SVMs) for the estimation. Data from 95 cylinder concrete samples are collected to develop and validate the model. The results show that combined NDT methods (also known as SonReb method) yield better estimations than single NDT methods. The results also show that the SVMs model is more accurate than the statistical regression model. PMID:28793627

  10. Estimation of descriptive statistics for multiply censored water quality data

    USGS Publications Warehouse

    Helsel, Dennis R.; Cohn, Timothy A.

    1988-01-01

    This paper extends the work of Gilliom and Helsel (1986) on procedures for estimating descriptive statistics of water quality data that contain “less than” observations. Previously, procedures were evaluated when only one detection limit was present. Here we investigate the performance of estimators for data that have multiple detection limits. Probability plotting and maximum likelihood methods perform substantially better than simple substitution procedures now commonly in use. Therefore simple substitution procedures (e.g., substitution of the detection limit) should be avoided. Probability plotting methods are more robust than maximum likelihood methods to misspecification of the parent distribution and their use should be encouraged in the typical situation where the parent distribution is unknown. When utilized correctly, less than values frequently contain nearly as much information for estimating population moments and quantiles as would the same observations had the detection limit been below them.

  11. On an additive partial correlation operator and nonparametric estimation of graphical models.

    PubMed

    Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu

    2016-09-01

    We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.

  12. On an additive partial correlation operator and nonparametric estimation of graphical models

    PubMed Central

    Li, Bing; Zhao, Hongyu

    2016-01-01

    Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689

  13. Assessing artificial neural networks and statistical methods for infilling missing soil moisture records

    NASA Astrophysics Data System (ADS)

    Dumedah, Gift; Walker, Jeffrey P.; Chik, Li

    2014-07-01

    Soil moisture information is critically important for water management operations including flood forecasting, drought monitoring, and groundwater recharge estimation. While an accurate and continuous record of soil moisture is required for these applications, the available soil moisture data, in practice, is typically fraught with missing values. There are a wide range of methods available to infilling hydrologic variables, but a thorough inter-comparison between statistical methods and artificial neural networks has not been made. This study examines 5 statistical methods including monthly averages, weighted Pearson correlation coefficient, a method based on temporal stability of soil moisture, and a weighted merging of the three methods, together with a method based on the concept of rough sets. Additionally, 9 artificial neural networks are examined, broadly categorized into feedforward, dynamic, and radial basis networks. These 14 infilling methods were used to estimate missing soil moisture records and subsequently validated against known values for 13 soil moisture monitoring stations for three different soil layer depths in the Yanco region in southeast Australia. The evaluation results show that the top three highest performing methods are the nonlinear autoregressive neural network, rough sets method, and monthly replacement. A high estimation accuracy (root mean square error (RMSE) of about 0.03 m/m) was found in the nonlinear autoregressive network, due to its regression based dynamic network which allows feedback connections through discrete-time estimation. An equally high accuracy (0.05 m/m RMSE) in the rough sets procedure illustrates the important role of temporal persistence of soil moisture, with the capability to account for different soil moisture conditions.

  14. CADDIS Volume 4. Data Analysis: Biological and Environmental Data Requirements

    EPA Pesticide Factsheets

    Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.

  15. Analysis models for the estimation of oceanic fields

    NASA Technical Reports Server (NTRS)

    Carter, E. F.; Robinson, A. R.

    1987-01-01

    A general model for statistically optimal estimates is presented for dealing with scalar, vector and multivariate datasets. The method deals with anisotropic fields and treats space and time dependence equivalently. Problems addressed include the analysis, or the production of synoptic time series of regularly gridded fields from irregular and gappy datasets, and the estimate of fields by compositing observations from several different instruments and sampling schemes. Technical issues are discussed, including the convergence of statistical estimates, the choice of representation of the correlations, the influential domain of an observation, and the efficiency of numerical computations.

  16. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted

  17. A REVIEW OF STATISTICAL METHODS FOR THE METEOROLOGICAL ADJUSTMENT OF TROPOSPHERIC OZONE. (R825173)

    EPA Science Inventory

    Abstract

    A variety of statistical methods for meteorological adjustment of ozone have been proposed in the literature over the last decade for purposes of forecasting, estimating ozone time trends, or investigating underlying mechanisms from an empirical perspective. T...

  18. A Statistical Method of Identifying Interactions in Neuron–Glia Systems Based on Functional Multicell Ca2+ Imaging

    PubMed Central

    Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin

    2014-01-01

    Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874

  19. A comparison of performance of automatic cloud coverage assessment algorithm for Formosat-2 image using clustering-based and spatial thresholding methods

    NASA Astrophysics Data System (ADS)

    Hsu, Kuo-Hsien

    2012-11-01

    Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.

  20. Fresh Biomass Estimation in Heterogeneous Grassland Using Hyperspectral Measurements and Multivariate Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.

    2014-12-01

    Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.

  1. The statistical analysis of circadian phase and amplitude in constant-routine core-temperature data

    NASA Technical Reports Server (NTRS)

    Brown, E. N.; Czeisler, C. A.

    1992-01-01

    Accurate estimation of the phases and amplitude of the endogenous circadian pacemaker from constant-routine core-temperature series is crucial for making inferences about the properties of the human biological clock from data collected under this protocol. This paper presents a set of statistical methods based on a harmonic-regression-plus-correlated-noise model for estimating the phases and the amplitude of the endogenous circadian pacemaker from constant-routine core-temperature data. The methods include a Bayesian Monte Carlo procedure for computing the uncertainty in these circadian functions. We illustrate the techniques with a detailed study of a single subject's core-temperature series and describe their relationship to other statistical methods for circadian data analysis. In our laboratory, these methods have been successfully used to analyze more than 300 constant routines and provide a highly reliable means of extracting phase and amplitude information from core-temperature data.

  2. R package to estimate intracluster correlation coefficient with confidence interval for binary data.

    PubMed

    Chakraborty, Hrishikesh; Hossain, Akhtar

    2018-03-01

    The Intracluster Correlation Coefficient (ICC) is a major parameter of interest in cluster randomized trials that measures the degree to which responses within the same cluster are correlated. There are several types of ICC estimators and its confidence intervals (CI) suggested in the literature for binary data. Studies have compared relative weaknesses and advantages of ICC estimators as well as its CI for binary data and suggested situations where one is advantageous in practical research. The commonly used statistical computing systems currently facilitate estimation of only a very few variants of ICC and its CI. To address the limitations of current statistical packages, we developed an R package, ICCbin, to facilitate estimating ICC and its CI for binary responses using different methods. The ICCbin package is designed to provide estimates of ICC in 16 different ways including analysis of variance methods, moments based estimation, direct probabilistic methods, correlation based estimation, and resampling method. CI of ICC is estimated using 5 different methods. It also generates cluster binary data using exchangeable correlation structure. ICCbin package provides two functions for users. The function rcbin() generates cluster binary data and the function iccbin() estimates ICC and it's CI. The users can choose appropriate ICC and its CI estimate from the wide selection of estimates from the outputs. The R package ICCbin presents very flexible and easy to use ways to generate cluster binary data and to estimate ICC and it's CI for binary response using different methods. The package ICCbin is freely available for use with R from the CRAN repository (https://cran.r-project.org/package=ICCbin). We believe that this package can be a very useful tool for researchers to design cluster randomized trials with binary outcome. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    ERIC Educational Resources Information Center

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  4. ASYMPTOTIC DISTRIBUTION OF ΔAUC, NRIs, AND IDI BASED ON THEORY OF U-STATISTICS

    PubMed Central

    Demler, Olga V.; Pencina, Michael J.; Cook, Nancy R.; D’Agostino, Ralph B.

    2017-01-01

    The change in AUC (ΔAUC), the IDI, and NRI are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues we unite the ΔAUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ΔAUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ΔAUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ΔAUC, NRIs, or IDI. In the former case SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ΔAUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ΔAUC. PMID:28627112

  5. Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.

    PubMed

    Demler, Olga V; Pencina, Michael J; Cook, Nancy R; D'Agostino, Ralph B

    2017-09-20

    The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Quantitative comparisons of three automated methods for estimating intracranial volume: A study of 270 longitudinal magnetic resonance images.

    PubMed

    Shang, Xiaoyan; Carlson, Michelle C; Tang, Xiaoying

    2018-04-30

    Total intracranial volume (TIV) is often used as a measure of brain size to correct for individual variability in magnetic resonance imaging (MRI) based morphometric studies. An adjustment of TIV can greatly increase the statistical power of brain morphometry methods. As such, an accurate and precise TIV estimation is of great importance in MRI studies. In this paper, we compared three automated TIV estimation methods (multi-atlas likelihood fusion (MALF), Statistical Parametric Mapping 8 (SPM8) and FreeSurfer (FS)) using longitudinal T1-weighted MR images in a cohort of 70 older participants at elevated sociodemographic risk for Alzheimer's disease. Statistical group comparisons in terms of four different metrics were performed. Furthermore, sex, education level, and intervention status were investigated separately for their impacts on the TIV estimation performance of each method. According to our experimental results, MALF was the least susceptible to atrophy, while SPM8 and FS suffered a loss in precision. In group-wise analysis, MALF was the least sensitive method to group variation, whereas SPM8 was particularly sensitive to sex and FS was unstable with respect to education level. In terms of effectiveness, both MALF and SPM8 delivered a user-friendly performance, while FS was relatively computationally intensive. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. rpsftm: An R Package for Rank Preserving Structural Failure Time Models

    PubMed Central

    Allison, Annabel; White, Ian R; Bond, Simon

    2018-01-01

    Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ, is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z(ψ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm. PMID:29564164

  8. rpsftm: An R Package for Rank Preserving Structural Failure Time Models.

    PubMed

    Allison, Annabel; White, Ian R; Bond, Simon

    2017-12-04

    Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ , is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z ( ψ ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm.

  9. Methods for estimating aboveground biomass and its components for Douglas-fir and lodgepole pine trees

    Treesearch

    K.P. Poudel; H. Temesgen

    2016-01-01

    Estimating aboveground biomass and its components requires sound statistical formulation and evaluation. Using data collected from 55 destructively sampled trees in different parts of Oregon, we evaluated the performance of three groups of methods to estimate total aboveground biomass and (or) its components based on the bias and root mean squared error (RMSE) that...

  10. Comparison of Efficiency of Jackknife and Variance Component Estimators of Standard Errors. Program Statistics Research. Technical Report.

    ERIC Educational Resources Information Center

    Longford, Nicholas T.

    Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…

  11. Similar negative impacts of temperature on global wheat yield estimated by three independent methods

    USDA-ARS?s Scientific Manuscript database

    The potential impact of global temperature change on global wheat production has recently been assessed with different methods, scaling and aggregation approaches. Here we show that grid-based simulations, point-based simulations, and statistical regressions produce similar estimates of temperature ...

  12. Covariance estimation in Terms of Stokes Parameters with Application to Vector Sensor Imaging

    DTIC Science & Technology

    2016-12-15

    S. Klein, “HF Vector Sensor for Radio Astronomy : Ground Testing Results,” in AIAA SPACE 2016, ser. AIAA SPACE Forum, American Institute of... astronomy ,” in 2016 IEEE Aerospace Conference, Mar. 2016, pp. 1–17. doi: 10.1109/ AERO.2016.7500688. [4] K.-C. Ho, K.-C. Tan, and B. T. G. Tan, “Estimation of...Statistical Imaging in Radio Astronomy via an Expectation-Maximization Algorithm for Structured Covariance Estimation,” in Statistical Methods in Imaging: IN

  13. Maximum a posteriori decoder for digital communications

    NASA Technical Reports Server (NTRS)

    Altes, Richard A. (Inventor)

    1997-01-01

    A system and method for decoding by identification of the most likely phase coded signal corresponding to received data. The present invention has particular application to communication with signals that experience spurious random phase perturbations. The generalized estimator-correlator uses a maximum a posteriori (MAP) estimator to generate phase estimates for correlation with incoming data samples and for correlation with mean phases indicative of unique hypothesized signals. The result is a MAP likelihood statistic for each hypothesized transmission, wherein the highest value statistic identifies the transmitted signal.

  14. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

    PubMed Central

    Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

    2015-01-01

    Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481

  15. Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data.

    PubMed

    Carmichael, Owen; Sakhanenko, Lyudmila

    2015-05-15

    We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way.

  16. Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data

    PubMed Central

    Carmichael, Owen; Sakhanenko, Lyudmila

    2015-01-01

    We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way. PMID:25937674

  17. CADDIS Volume 4. Data Analysis: Predicting Environmental Conditions from Biological Observations (PECBO Appendix)

    EPA Pesticide Factsheets

    Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.

  18. Extreme Quantile Estimation in Binary Response Models

    DTIC Science & Technology

    1990-03-01

    in Cancer Research," Biometria , VoL 66, pp. 307-316. Hsi, B.P. [1969], ’The Multiple Sample Up-and-Down Method in Bioassay," Journal of the American...New Method of Estimation," Biometria , VoL 53, pp. 439-454. Wetherill, G.B. [1976], Sequential Methods in Statistics, London: Chapman and Hall. Wu, C.FJ

  19. Bayesian estimation of the transmissivity spatial structure from pumping test data

    NASA Astrophysics Data System (ADS)

    Demir, Mehmet Taner; Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier

    2017-06-01

    Estimating the statistical parameters (mean, variance, and integral scale) that define the spatial structure of the transmissivity or hydraulic conductivity fields is a fundamental step for the accurate prediction of subsurface flow and contaminant transport. In practice, the determination of the spatial structure is a challenge because of spatial heterogeneity and data scarcity. In this paper, we describe a novel approach that uses time drawdown data from multiple pumping tests to determine the transmissivity statistical spatial structure. The method builds on the pumping test interpretation procedure of Copty et al. (2011) (Continuous Derivation method, CD), which uses the time-drawdown data and its time derivative to estimate apparent transmissivity values as a function of radial distance from the pumping well. A Bayesian approach is then used to infer the statistical parameters of the transmissivity field by combining prior information about the parameters and the likelihood function expressed in terms of radially-dependent apparent transmissivities determined from pumping tests. A major advantage of the proposed Bayesian approach is that the likelihood function is readily determined from randomly generated multiple realizations of the transmissivity field, without the need to solve the groundwater flow equation. Applying the method to synthetically-generated pumping test data, we demonstrate that, through a relatively simple procedure, information on the spatial structure of the transmissivity may be inferred from pumping tests data. It is also shown that the prior parameter distribution has a significant influence on the estimation procedure, given the non-uniqueness of the estimation procedure. Results also indicate that the reliability of the estimated transmissivity statistical parameters increases with the number of available pumping tests.

  20. Estimation of In Situ Stresses with Hydro-Fracturing Tests and a Statistical Method

    NASA Astrophysics Data System (ADS)

    Lee, Hikweon; Ong, See Hong

    2018-03-01

    At great depths, where borehole-based field stress measurements such as hydraulic fracturing are challenging due to difficult downhole conditions or prohibitive costs, in situ stresses can be indirectly estimated using wellbore failures such as borehole breakouts and/or drilling-induced tensile failures detected by an image log. As part of such efforts, a statistical method has been developed in which borehole breakouts detected on an image log are used for this purpose (Song et al. in Proceedings on the 7th international symposium on in situ rock stress, 2016; Song and Chang in J Geophys Res Solid Earth 122:4033-4052, 2017). The method employs a grid-searching algorithm in which the least and maximum horizontal principal stresses ( S h and S H) are varied, and the corresponding simulated depth-related breakout width distribution as a function of the breakout angle ( θ B = 90° - half of breakout width) is compared to that observed along the borehole to determine a set of S h and S H having the lowest misfit between them. An important advantage of the method is that S h and S H can be estimated simultaneously in vertical wells. To validate the statistical approach, the method is applied to a vertical hole where a set of field hydraulic fracturing tests have been carried out. The stress estimations using the proposed method were found to be in good agreement with the results interpreted from the hydraulic fracturing test measurements.

  1. Sample Size Estimation: The Easy Way

    ERIC Educational Resources Information Center

    Weller, Susan C.

    2015-01-01

    This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…

  2. Box-Counting Dimension Revisited: Presenting an Efficient Method of Minimizing Quantization Error and an Assessment of the Self-Similarity of Structural Root Systems

    PubMed Central

    Bouda, Martin; Caplan, Joshua S.; Saiers, James E.

    2016-01-01

    Fractal dimension (FD), estimated by box-counting, is a metric used to characterize plant anatomical complexity or space-filling characteristic for a variety of purposes. The vast majority of published studies fail to evaluate the assumption of statistical self-similarity, which underpins the validity of the procedure. The box-counting procedure is also subject to error arising from arbitrary grid placement, known as quantization error (QE), which is strictly positive and varies as a function of scale, making it problematic for the procedure's slope estimation step. Previous studies either ignore QE or employ inefficient brute-force grid translations to reduce it. The goals of this study were to characterize the effect of QE due to translation and rotation on FD estimates, to provide an efficient method of reducing QE, and to evaluate the assumption of statistical self-similarity of coarse root datasets typical of those used in recent trait studies. Coarse root systems of 36 shrubs were digitized in 3D and subjected to box-counts. A pattern search algorithm was used to minimize QE by optimizing grid placement and its efficiency was compared to the brute force method. The degree of statistical self-similarity was evaluated using linear regression residuals and local slope estimates. QE, due to both grid position and orientation, was a significant source of error in FD estimates, but pattern search provided an efficient means of minimizing it. Pattern search had higher initial computational cost but converged on lower error values more efficiently than the commonly employed brute force method. Our representations of coarse root system digitizations did not exhibit details over a sufficient range of scales to be considered statistically self-similar and informatively approximated as fractals, suggesting a lack of sufficient ramification of the coarse root systems for reiteration to be thought of as a dominant force in their development. FD estimates did not characterize the scaling of our digitizations well: the scaling exponent was a function of scale. Our findings serve as a caution against applying FD under the assumption of statistical self-similarity without rigorously evaluating it first. PMID:26925073

  3. On the statistical significance of excess events: Remarks of caution and the need for a standard method of calculation

    NASA Technical Reports Server (NTRS)

    Staubert, R.

    1985-01-01

    Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.

  4. Nebraska's forests, 2005: statistics, methods, and quality assurance

    Treesearch

    Patrick D. Miles; Dacia M. Meneguzzo; Charles J. Barnett

    2011-01-01

    The first full annual inventory of Nebraska's forests was completed in 2005 after 8,335 plots were selected and 274 forested plots were visited and measured. This report includes detailed information on forest inventory methods, and data quality estimates. Tables of various important resource statistics are presented. Detailed analysis of the inventory data are...

  5. Kansas's forests, 2005: statistics, methods, and quality assurance

    Treesearch

    Patrick D. Miles; W. Keith Moser; Charles J. Barnett

    2011-01-01

    The first full annual inventory of Kansas's forests was completed in 2005 after 8,868 plots were selected and 468 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of Kansas inventory is presented...

  6. Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS).

    PubMed

    Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M; Khan, Ajmal

    2016-01-01

    This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher's exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher's exact test, logistic regression, epidemiological statistics, and non-parametric tests. This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design.

  7. Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

    PubMed

    Ma, Yan; Mazumdar, Madhu

    2011-10-30

    Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.

  8. Covariate analysis of bivariate survival data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, L.E.

    1992-01-01

    The methods developed are used to analyze the effects of covariates on bivariate survival data when censoring and ties are present. The proposed method provides models for bivariate survival data that include differential covariate effects and censored observations. The proposed models are based on an extension of the univariate Buckley-James estimators which replace censored data points by their expected values, conditional on the censoring time and the covariates. For the bivariate situation, it is necessary to determine the expectation of the failure times for one component conditional on the failure or censoring time of the other component. Two different methodsmore » have been developed to estimate these expectations. In the semiparametric approach these expectations are determined from a modification of Burke's estimate of the bivariate empirical survival function. In the parametric approach censored data points are also replaced by their conditional expected values where the expected values are determined from a specified parametric distribution. The model estimation will be based on the revised data set, comprised of uncensored components and expected values for the censored components. The variance-covariance matrix for the estimated covariate parameters has also been derived for both the semiparametric and parametric methods. Data from the Demographic and Health Survey was analyzed by these methods. The two outcome variables are post-partum amenorrhea and breastfeeding; education and parity were used as the covariates. Both the covariate parameter estimates and the variance-covariance estimates for the semiparametric and parametric models will be compared. In addition, a multivariate test statistic was used in the semiparametric model to examine contrasts. The significance of the statistic was determined from a bootstrap distribution of the test statistic.« less

  9. Estimating and comparing microbial diversity in the presence of sequencing errors

    PubMed Central

    Chiu, Chun-Huo

    2016-01-01

    Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872

  10. Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

    PubMed Central

    2011-01-01

    Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981

  11. Median nitrate concentrations in groundwater in the New Jersey Highlands Region estimated using regression models and land-surface characteristics

    USGS Publications Warehouse

    Baker, Ronald J.; Chepiga, Mary M.; Cauller, Stephen J.

    2015-01-01

    The Kaplan-Meier method of estimating summary statistics from left-censored data was applied in order to include nondetects (left-censored data) in median nitrate-concentration calculations. Median concentrations also were determined using three alternative methods of handling nondetects. Treatment of the 23 percent of samples that were nondetects had little effect on estimated median nitrate concentrations because method detection limits were mostly less than median values.

  12. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting.

    PubMed

    Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F

    2010-07-19

    A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.

  13. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

    PubMed Central

    2010-01-01

    Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827

  14. High throughput nonparametric probability density estimation.

    PubMed

    Farmer, Jenny; Jacobs, Donald

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

  15. High throughput nonparametric probability density estimation

    PubMed Central

    Farmer, Jenny

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803

  16. A statistical method for estimating rates of soil development and ages of geologic deposits: A design for soil-chronosequence studies

    USGS Publications Warehouse

    Switzer, P.; Harden, J.W.; Mark, R.K.

    1988-01-01

    A statistical method for estimating rates of soil development in a given region based on calibration from a series of dated soils is used to estimate ages of soils in the same region that are not dated directly. The method is designed specifically to account for sampling procedures and uncertainties that are inherent in soil studies. Soil variation and measurement error, uncertainties in calibration dates and their relation to the age of the soil, and the limited number of dated soils are all considered. Maximum likelihood (ML) is employed to estimate a parametric linear calibration curve, relating soil development to time or age on suitably transformed scales. Soil variation on a geomorphic surface of a certain age is characterized by replicate sampling of soils on each surface; such variation is assumed to have a Gaussian distribution. The age of a geomorphic surface is described by older and younger bounds. This technique allows age uncertainty to be characterized by either a Gaussian distribution or by a triangular distribution using minimum, best-estimate, and maximum ages. The calibration curve is taken to be linear after suitable (in certain cases logarithmic) transformations, if required, of the soil parameter and age variables. Soil variability, measurement error, and departures from linearity are described in a combined fashion using Gaussian distributions with variances particular to each sampled geomorphic surface and the number of sample replicates. Uncertainty in age of a geomorphic surface used for calibration is described using three parameters by one of two methods. In the first method, upper and lower ages are specified together with a coverage probability; this specification is converted to a Gaussian distribution with the appropriate mean and variance. In the second method, "absolute" older and younger ages are specified together with a most probable age; this specification is converted to an asymmetric triangular distribution with mode at the most probable age. The statistical variability of the ML-estimated calibration curve is assessed by a Monte Carlo method in which simulated data sets repeatedly are drawn from the distributional specification; calibration parameters are reestimated for each such simulation in order to assess their statistical variability. Several examples are used for illustration. The age of undated soils in a related setting may be estimated from the soil data using the fitted calibration curve. A second simulation to assess age estimate variability is described and applied to the examples. ?? 1988 International Association for Mathematical Geology.

  17. Challenges in Species Tree Estimation Under the Multispecies Coalescent Model

    PubMed Central

    Xu, Bo; Yang, Ziheng

    2016-01-01

    The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902

  18. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments

    NASA Technical Reports Server (NTRS)

    Abbey, Craig K.; Eckstein, Miguel P.

    2002-01-01

    We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.

  19. On Correlated-noise Analyses Applied to Exoplanet Light Curves

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Harrington, Joseph; Loredo, Thomas J.; Lust, Nate B.; Blecic, Jasmina; Stemm, Madison

    2017-01-01

    Time-correlated noise is a significant source of uncertainty when modeling exoplanet light-curve data. A correct assessment of correlated noise is fundamental to determine the true statistical significance of our findings. Here, we review three of the most widely used correlated-noise estimators in the exoplanet field, the time-averaging, residual-permutation, and wavelet-likelihood methods. We argue that the residual-permutation method is unsound in estimating the uncertainty of parameter estimates. We thus recommend to refrain from this method altogether. We characterize the behavior of the time averaging’s rms-versus-bin-size curves at bin sizes similar to the total observation duration, which may lead to underestimated uncertainties. For the wavelet-likelihood method, we note errors in the published equations and provide a list of corrections. We further assess the performance of these techniques by injecting and retrieving eclipse signals into synthetic and real Spitzer light curves, analyzing the results in terms of the relative-accuracy and coverage-fraction statistics. Both the time-averaging and wavelet-likelihood methods significantly improve the estimate of the eclipse depth over a white-noise analysis (a Markov-chain Monte Carlo exploration assuming uncorrelated noise). However, the corrections are not perfect when retrieving the eclipse depth from Spitzer data sets, these methods covered the true (injected) depth within the 68% credible region in only ˜45%-65% of the trials. Lastly, we present our open-source model-fitting tool, Multi-Core Markov-Chain Monte Carlo (MC3). This package uses Bayesian statistics to estimate the best-fitting values and the credible regions for the parameters for a (user-provided) model. MC3 is a Python/C code, available at https://github.com/pcubillos/MCcubed.

  20. Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

    NASA Technical Reports Server (NTRS)

    Hailperin, M.

    1993-01-01

    This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that the authors' techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. The authors' method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability.

  1. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  2. Adaptive Distributed Video Coding with Correlation Estimation using Expectation Propagation

    PubMed Central

    Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2013-01-01

    Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method. PMID:23750314

  3. Adaptive distributed video coding with correlation estimation using expectation propagation

    NASA Astrophysics Data System (ADS)

    Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2012-10-01

    Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.

  4. Adaptive Distributed Video Coding with Correlation Estimation using Expectation Propagation.

    PubMed

    Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2012-10-15

    Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.

  5. Methods for estimating selected low-flow statistics and development of annual flow-duration statistics for Ohio

    USGS Publications Warehouse

    Koltun, G.F.; Kula, Stephanie P.

    2013-01-01

    This report presents the results of a study to develop methods for estimating selected low-flow statistics and for determining annual flow-duration statistics for Ohio streams. Regression techniques were used to develop equations for estimating 10-year recurrence-interval (10-percent annual-nonexceedance probability) low-flow yields, in cubic feet per second per square mile, with averaging periods of 1, 7, 30, and 90-day(s), and for estimating the yield corresponding to the long-term 80-percent duration flow. These equations, which estimate low-flow yields as a function of a streamflow-variability index, are based on previously published low-flow statistics for 79 long-term continuous-record streamgages with at least 10 years of data collected through water year 1997. When applied to the calibration dataset, average absolute percent errors for the regression equations ranged from 15.8 to 42.0 percent. The regression results have been incorporated into the U.S. Geological Survey (USGS) StreamStats application for Ohio (http://water.usgs.gov/osw/streamstats/ohio.html) in the form of a yield grid to facilitate estimation of the corresponding streamflow statistics in cubic feet per second. Logistic-regression equations also were developed and incorporated into the USGS StreamStats application for Ohio for selected low-flow statistics to help identify occurrences of zero-valued statistics. Quantiles of daily and 7-day mean streamflows were determined for annual and annual-seasonal (September–November) periods for each complete climatic year of streamflow-gaging station record for 110 selected streamflow-gaging stations with 20 or more years of record. The quantiles determined for each climatic year were the 99-, 98-, 95-, 90-, 80-, 75-, 70-, 60-, 50-, 40-, 30-, 25-, 20-, 10-, 5-, 2-, and 1-percent exceedance streamflows. Selected exceedance percentiles of the annual-exceedance percentiles were subsequently computed and tabulated to help facilitate consideration of the annual risk of exceedance or nonexceedance of annual and annual-seasonal-period flow-duration values. The quantiles are based on streamflow data collected through climatic year 2008.

  6. Estimating the mass variance in neutron multiplicity counting-A comparison of approaches

    NASA Astrophysics Data System (ADS)

    Dubi, C.; Croft, S.; Favalli, A.; Ocherashvili, A.; Pedersen, B.

    2017-12-01

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α , n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.

  7. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dubi, C.; Croft, S.; Favalli, A.

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  8. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE PAGES

    Dubi, C.; Croft, S.; Favalli, A.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  9. Bayesian methods in reliability

    NASA Astrophysics Data System (ADS)

    Sander, P.; Badoux, R.

    1991-11-01

    The present proceedings from a course on Bayesian methods in reliability encompasses Bayesian statistical methods and their computational implementation, models for analyzing censored data from nonrepairable systems, the traits of repairable systems and growth models, the use of expert judgment, and a review of the problem of forecasting software reliability. Specific issues addressed include the use of Bayesian methods to estimate the leak rate of a gas pipeline, approximate analyses under great prior uncertainty, reliability estimation techniques, and a nonhomogeneous Poisson process. Also addressed are the calibration sets and seed variables of expert judgment systems for risk assessment, experimental illustrations of the use of expert judgment for reliability testing, and analyses of the predictive quality of software-reliability growth models such as the Weibull order statistics.

  10. Similar Estimates of Temperature Impacts on Global Wheat Yield by Three Independent Methods

    NASA Technical Reports Server (NTRS)

    Liu, Bing; Asseng, Senthold; Muller, Christoph; Ewart, Frank; Elliott, Joshua; Lobell, David B.; Martre, Pierre; Ruane, Alex C.; Wallach, Daniel; Jones, James W.; hide

    2016-01-01

    The potential impact of global temperature change on global crop yield has recently been assessed with different methods. Here we show that grid-based and point-based simulations and statistical regressions (from historic records), without deliberate adaptation or CO2 fertilization effects, produce similar estimates of temperature impact on wheat yields at global and national scales. With a 1 C global temperature increase, global wheat yield is projected to decline between 4.1% and 6.4%. Projected relative temperature impacts from different methods were similar for major wheat-producing countries China, India, USA and France, but less so for Russia. Point-based and grid-based simulations, and to some extent the statistical regressions, were consistent in projecting that warmer regions are likely to suffer more yield loss with increasing temperature than cooler regions. By forming a multi-method ensemble, it was possible to quantify 'method uncertainty' in addition to model uncertainty. This significantly improves confidence in estimates of climate impacts on global food security.

  11. Similar estimates of temperature impacts on global wheat yield by three independent methods

    NASA Astrophysics Data System (ADS)

    Liu, Bing; Asseng, Senthold; Müller, Christoph; Ewert, Frank; Elliott, Joshua; Lobell, David B.; Martre, Pierre; Ruane, Alex C.; Wallach, Daniel; Jones, James W.; Rosenzweig, Cynthia; Aggarwal, Pramod K.; Alderman, Phillip D.; Anothai, Jakarat; Basso, Bruno; Biernath, Christian; Cammarano, Davide; Challinor, Andy; Deryng, Delphine; Sanctis, Giacomo De; Doltra, Jordi; Fereres, Elias; Folberth, Christian; Garcia-Vila, Margarita; Gayler, Sebastian; Hoogenboom, Gerrit; Hunt, Leslie A.; Izaurralde, Roberto C.; Jabloun, Mohamed; Jones, Curtis D.; Kersebaum, Kurt C.; Kimball, Bruce A.; Koehler, Ann-Kristin; Kumar, Soora Naresh; Nendel, Claas; O'Leary, Garry J.; Olesen, Jørgen E.; Ottman, Michael J.; Palosuo, Taru; Prasad, P. V. Vara; Priesack, Eckart; Pugh, Thomas A. M.; Reynolds, Matthew; Rezaei, Ehsan E.; Rötter, Reimund P.; Schmid, Erwin; Semenov, Mikhail A.; Shcherbak, Iurii; Stehfest, Elke; Stöckle, Claudio O.; Stratonovitch, Pierre; Streck, Thilo; Supit, Iwan; Tao, Fulu; Thorburn, Peter; Waha, Katharina; Wall, Gerard W.; Wang, Enli; White, Jeffrey W.; Wolf, Joost; Zhao, Zhigan; Zhu, Yan

    2016-12-01

    The potential impact of global temperature change on global crop yield has recently been assessed with different methods. Here we show that grid-based and point-based simulations and statistical regressions (from historic records), without deliberate adaptation or CO2 fertilization effects, produce similar estimates of temperature impact on wheat yields at global and national scales. With a 1 °C global temperature increase, global wheat yield is projected to decline between 4.1% and 6.4%. Projected relative temperature impacts from different methods were similar for major wheat-producing countries China, India, USA and France, but less so for Russia. Point-based and grid-based simulations, and to some extent the statistical regressions, were consistent in projecting that warmer regions are likely to suffer more yield loss with increasing temperature than cooler regions. By forming a multi-method ensemble, it was possible to quantify `method uncertainty’ in addition to model uncertainty. This significantly improves confidence in estimates of climate impacts on global food security.

  12. Bulk tank somatic cell counts analyzed by statistical process control tools to identify and monitor subclinical mastitis incidence.

    PubMed

    Lukas, J M; Hawkins, D M; Kinsel, M L; Reneau, J K

    2005-11-01

    The objective of this study was to examine the relationship between monthly Dairy Herd Improvement (DHI) subclinical mastitis and new infection rate estimates and daily bulk tank somatic cell count (SCC) summarized by statistical process control tools. Dairy Herd Improvement Association test-day subclinical mastitis and new infection rate estimates along with daily or every other day bulk tank SCC data were collected for 12 mo of 2003 from 275 Upper Midwest dairy herds. Herds were divided into 5 herd production categories. A linear score [LNS = ln(BTSCC/100,000)/0.693147 + 3] was calculated for each individual bulk tank SCC. For both the raw SCC and the transformed data, the mean and sigma were calculated using the statistical quality control individual measurement and moving range chart procedure of Statistical Analysis System. One hundred eighty-three herds of the 275 herds from the study data set were then randomly selected and the raw (method 1) and transformed (method 2) bulk tank SCC mean and sigma were used to develop models for predicting subclinical mastitis and new infection rate estimates. Herd production category was also included in all models as 5 dummy variables. Models were validated by calculating estimates of subclinical mastitis and new infection rates for the remaining 92 herds and plotting them against observed values of each of the dependents. Only herd production category and bulk tank SCC mean were significant and remained in the final models. High R2 values (0.83 and 0.81 for methods 1 and 2, respectively) indicated a strong correlation between the bulk tank SCC and herd's subclinical mastitis prevalence. The standard errors of the estimate were 4.02 and 4.28% for methods 1 and 2, respectively, and decreased with increasing herd production. As a case study, Shewhart Individual Measurement Charts were plotted from the bulk tank SCC to identify shifts in mastitis incidence. Four of 5 charts examined signaled a change in bulk tank SCC before the DHI test day identified the change in subclinical mastitis prevalence. It can be concluded that applying statistical process control tools to daily bulk tank SCC can be used to estimate subclinical mastitis prevalence in the herd and observe for change in the subclinical mastitis status. Single DHI test day estimates of new infection rate were insufficient to accurately describe its dynamics.

  13. Statistical lamb wave localization based on extreme value theory

    NASA Astrophysics Data System (ADS)

    Harley, Joel B.

    2018-04-01

    Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.

  14. Calibration of remotely sensed proportion or area estimates for misclassification error

    Treesearch

    Raymond L. Czaplewski; Glenn P. Catts

    1992-01-01

    Classifications of remotely sensed data contain misclassification errors that bias areal estimates. Monte Carlo techniques were used to compare two statistical methods that correct or calibrate remotely sensed areal estimates for misclassification bias using reference data from an error matrix. The inverse calibration estimator was consistently superior to the...

  15. Analysis of variance to assess statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes.

    PubMed

    Makeyev, Oleksandr; Joe, Cody; Lee, Colin; Besio, Walter G

    2017-07-01

    Concentric ring electrodes have shown promise in non-invasive electrophysiological measurement demonstrating their superiority to conventional disc electrodes, in particular, in accuracy of Laplacian estimation. Recently, we have proposed novel variable inter-ring distances concentric ring electrodes. Analytic and finite element method modeling results for linearly increasing distances electrode configurations suggested they may decrease the truncation error resulting in more accurate Laplacian estimates compared to currently used constant inter-ring distances configurations. This study assesses statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes. Full factorial design of analysis of variance was used with one categorical and two numerical factors: the inter-ring distances, the electrode diameter, and the number of concentric rings in the electrode. The response variables were the Relative Error and the Maximum Error of Laplacian estimation computed using a finite element method model for each of the combinations of levels of three factors. Effects of the main factors and their interactions on Relative Error and Maximum Error were assessed and the obtained results suggest that all three factors have statistically significant effects in the model confirming the potential of using inter-ring distances as a means of improving accuracy of Laplacian estimation.

  16. Bayesian statistics and Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Koch, K. R.

    2018-03-01

    The Bayesian approach allows an intuitive way to derive the methods of statistics. Probability is defined as a measure of the plausibility of statements or propositions. Three rules are sufficient to obtain the laws of probability. If the statements refer to the numerical values of variables, the so-called random variables, univariate and multivariate distributions follow. They lead to the point estimation by which unknown quantities, i.e. unknown parameters, are computed from measurements. The unknown parameters are random variables, they are fixed quantities in traditional statistics which is not founded on Bayes' theorem. Bayesian statistics therefore recommends itself for Monte Carlo methods, which generate random variates from given distributions. Monte Carlo methods, of course, can also be applied in traditional statistics. The unknown parameters, are introduced as functions of the measurements, and the Monte Carlo methods give the covariance matrix and the expectation of these functions. A confidence region is derived where the unknown parameters are situated with a given probability. Following a method of traditional statistics, hypotheses are tested by determining whether a value for an unknown parameter lies inside or outside the confidence region. The error propagation of a random vector by the Monte Carlo methods is presented as an application. If the random vector results from a nonlinearly transformed vector, its covariance matrix and its expectation follow from the Monte Carlo estimate. This saves a considerable amount of derivatives to be computed, and errors of the linearization are avoided. The Monte Carlo method is therefore efficient. If the functions of the measurements are given by a sum of two or more random vectors with different multivariate distributions, the resulting distribution is generally not known. TheMonte Carlo methods are then needed to obtain the covariance matrix and the expectation of the sum.

  17. Real-time Mainshock Forecast by Statistical Discrimination of Foreshock Clusters

    NASA Astrophysics Data System (ADS)

    Nomura, S.; Ogata, Y.

    2016-12-01

    Foreshock discremination is one of the most effective ways for short-time forecast of large main shocks. Though many large earthquakes accompany their foreshocks, discreminating them from enormous small earthquakes is difficult and only probabilistic evaluation from their spatio-temporal features and magnitude evolution may be available. Logistic regression is the statistical learning method best suited to such binary pattern recognition problems where estimates of a-posteriori probability of class membership are required. Statistical learning methods can keep learning discreminating features from updating catalog and give probabilistic recognition of forecast in real time. We estimated a non-linear function of foreshock proportion by smooth spline bases and evaluate the possibility of foreshocks by the logit function. In this study, we classified foreshocks from earthquake catalog by the Japan Meteorological Agency by single-link clustering methods and learned spatial and temporal features of foreshocks by the probability density ratio estimation. We use the epicentral locations, time spans and difference in magnitudes for learning and forecasting. Magnitudes of main shocks are also predicted our method by incorporating b-values into our method. We discuss the spatial pattern of foreshocks from the classifier composed by our model. We also implement a back test to validate predictive performance of the model by this catalog.

  18. Use of statistical and neural net approaches in predicting toxicity of chemicals.

    PubMed

    Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D

    2000-01-01

    Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.

  19. A new statistical approach to climate change detection and attribution

    NASA Astrophysics Data System (ADS)

    Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

    2017-01-01

    We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).

  20. Significance levels for studies with correlated test statistics.

    PubMed

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  1. Trends in study design and the statistical methods employed in a leading general medicine journal.

    PubMed

    Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

    2018-02-01

    Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.

  2. Kappa statistic for the clustered dichotomous responses from physicians and patients

    PubMed Central

    Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L.; Cai, Jianwen

    2013-01-01

    The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared to the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. An example of an application to a coronary heart disease prevention study is presented. PMID:23533082

  3. Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS)

    PubMed Central

    Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M.; Khan, Ajmal

    2016-01-01

    Objective: This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Methods: Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. Results: A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher’s exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher’s exact test, logistic regression, epidemiological statistics, and non-parametric tests. Conclusion: This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design. PMID:27022365

  4. Performance of Bootstrapping Approaches To Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Nevitt, Jonathan; Hancock, Gregory R.

    2001-01-01

    Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…

  5. United States Census 2000 Population with Bridged Race Categories. Vital and Health Statistics. Data Evaluation and Methods Research.

    ERIC Educational Resources Information Center

    Ingram, Deborah D.; Parker, Jennifer D.; Schenker, Nathaniel; Weed, James A.; Hamilton, Brady; Arias, Elizabeth; Madans, Jennifer H.

    This report documents the National Center for Health Statistics' (NCHS) methods for bridging the Census 2000 multiple-race resident population to single-race categories and describing bridged race resident population estimates. Data came from the pooled 1997-2000 National Health Interview Surveys. The bridging models included demographic and…

  6. North Dakota's forests, 2005: statistics, methods, and quality assurance

    Treesearch

    Patrick D. Miles; David E. Haugen; Charles J. Barnett

    2011-01-01

    The first full annual inventory of North Dakota's forests was completed in 2005 after 7,622 plots were selected and 164 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of the North Dakota...

  7. Illinois' Forests, 2005: Statistics, Methods, and Quality Assurance

    Treesearch

    Susan J. Crocker; Charles J. Barnett; Mark A. Hatfield

    2013-01-01

    The first full annual inventory of Illinois' forests was completed in 2005. This report contains 1) descriptive information on methods, statistics, and quality assurance of data collection, 2) a glossary of terms, 3) tables that summarize quality assurance, and 4) a core set of tabular estimates for a variety of forest resources. A detailed analysis of inventory...

  8. South Dakota's forests, 2005: statistics, methods, and quality assurance

    Treesearch

    Patrick D. Miles; Ronald J. Piva; Charles J. Barnett

    2011-01-01

    The first full annual inventory of South Dakota's forests was completed in 2005 after 8,302 plots were selected and 325 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of the South Dakota...

  9. Volatility measurement with directional change in Chinese stock market: Statistical property and investment strategy

    NASA Astrophysics Data System (ADS)

    Ma, Junjun; Xiong, Xiong; He, Feng; Zhang, Wei

    2017-04-01

    The stock price fluctuation is studied in this paper with intrinsic time perspective. The event, directional change (DC) or overshoot, are considered as time scale of price time series. With this directional change law, its corresponding statistical properties and parameter estimation is tested in Chinese stock market. Furthermore, a directional change trading strategy is proposed for invest in the market portfolio in Chinese stock market, and both in-sample and out-of-sample performance are compared among the different method of model parameter estimation. We conclude that DC method can capture important fluctuations in Chinese stock market and gain profit due to the statistical property that average upturn overshoot size is bigger than average downturn directional change size. The optimal parameter of DC method is not fixed and we obtained 1.8% annual excess return with this DC-based trading strategy.

  10. A Statistical Framework for the Functional Analysis of Metagenomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharon, Itai; Pati, Amrita; Markowitz, Victor

    2008-10-01

    Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements.more » They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.« less

  11. Optimal allocation of testing resources for statistical simulations

    NASA Astrophysics Data System (ADS)

    Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick

    2015-07-01

    Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.

  12. Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.

    PubMed

    Mutimbu, Lawrence; Robles-Kelly, Antonio

    2016-08-31

    This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.

  13. A critique of the usefulness of inferential statistics in applied behavior analysis

    PubMed Central

    Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

    1998-01-01

    Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304

  14. An Experimental Study of the Effect of Judges' Knowledge of Item Data on Two Forms of the Angoff Standard Setting Method.

    ERIC Educational Resources Information Center

    Garrido, Mariquita; Payne, David A.

    Minimum competency cut-off scores on a statistics exam were estimated under four conditions: the Angoff judging method with item data (n=20), and without data available (n=19); and the Modified Angoff method with (n=19), and without (n=19) item data available to judges. The Angoff method required free response percentage estimates (0-100) percent,…

  15. Trans-dimensional and hierarchical Bayesian approaches toward rigorous estimation of seismic sources and structures in the Northeast Asia

    NASA Astrophysics Data System (ADS)

    Kim, Seongryong; Tkalčić, Hrvoje; Mustać, Marija; Rhie, Junkee; Ford, Sean

    2016-04-01

    A framework is presented within which we provide rigorous estimations for seismic sources and structures in the Northeast Asia. We use Bayesian inversion methods, which enable statistical estimations of models and their uncertainties based on data information. Ambiguities in error statistics and model parameterizations are addressed by hierarchical and trans-dimensional (trans-D) techniques, which can be inherently implemented in the Bayesian inversions. Hence reliable estimation of model parameters and their uncertainties is possible, thus avoiding arbitrary regularizations and parameterizations. Hierarchical and trans-D inversions are performed to develop a three-dimensional velocity model using ambient noise data. To further improve the model, we perform joint inversions with receiver function data using a newly developed Bayesian method. For the source estimation, a novel moment tensor inversion method is presented and applied to regional waveform data of the North Korean nuclear explosion tests. By the combination of new Bayesian techniques and the structural model, coupled with meaningful uncertainties related to each of the processes, more quantitative monitoring and discrimination of seismic events is possible.

  16. Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

    PubMed Central

    Guan, Yongtao; Li, Yehua; Sinha, Rajita

    2011-01-01

    In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854

  17. A General Linear Method for Equating with Small Samples

    ERIC Educational Resources Information Center

    Albano, Anthony D.

    2015-01-01

    Research on equating with small samples has shown that methods with stronger assumptions and fewer statistical estimates can lead to decreased error in the estimated equating function. This article introduces a new approach to linear observed-score equating, one which provides flexible control over how form difficulty is assumed versus estimated…

  18. Estimation procedures for the combined 1990s periodic forest inventories of California, Oregon, and Washington.

    Treesearch

    T.M. Barrett

    2004-01-01

    During the 1990s, forest inventories for California, Oregon, and Washington were conducted by different agencies using different methods. The Pacific Northwest Research Station Forest Inventory and Analysis program recently integrated these inventories into a single database. This document briefly describes potential statistical methods for estimating population totals...

  19. Improving stochastic estimates with inference methods: calculating matrix diagonals.

    PubMed

    Selig, Marco; Oppermann, Niels; Ensslin, Torsten A

    2012-02-01

    Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method. © 2012 American Physical Society

  20. Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

    NASA Astrophysics Data System (ADS)

    Al Sobhi, Mashail M.

    2015-02-01

    Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.

  1. Precipitation intensity probability distribution modelling for hydrological and construction design purposes

    NASA Astrophysics Data System (ADS)

    Koshinchanov, Georgy; Dimitrov, Dobri

    2008-11-01

    The characteristics of rainfall intensity are important for many purposes, including design of sewage and drainage systems, tuning flood warning procedures, etc. Those estimates are usually statistical estimates of the intensity of precipitation realized for certain period of time (e.g. 5, 10 min., etc) with different return period (e.g. 20, 100 years, etc). The traditional approach in evaluating the mentioned precipitation intensities is to process the pluviometer's records and fit probability distribution to samples of intensities valid for certain locations ore regions. Those estimates further become part of the state regulations to be used for various economic activities. Two problems occur using the mentioned approach: 1. Due to various factors the climate conditions are changed and the precipitation intensity estimates need regular update; 2. As far as the extremes of the probability distribution are of particular importance for the practice, the methodology of the distribution fitting needs specific attention to those parts of the distribution. The aim of this paper is to make review of the existing methodologies for processing the intensive rainfalls and to refresh some of the statistical estimates for the studied areas. The methodologies used in Bulgaria for analyzing the intensive rainfalls and produce relevant statistical estimates: The method of the maximum intensity, used in the National Institute of Meteorology and Hydrology to process and decode the pluviometer's records, followed by distribution fitting for each precipitation duration period; As the above, but with separate modeling of probability distribution for the middle and high probability quantiles. Method is similar to the first one, but with a threshold of 0,36 mm/min of intensity; Another method proposed by the Russian hydrologist G. A. Aleksiev for regionalization of estimates over some territory, improved and adapted by S. Gerasimov for Bulgaria; Next method is considering only the intensive rainfalls (if any) during the day with the maximal annual daily precipitation total for a given year; Conclusions are drown on the relevance and adequacy of the applied methods.

  2. Reentry survivability modeling

    NASA Astrophysics Data System (ADS)

    Fudge, Michael L.; Maher, Robert L.

    1997-10-01

    Statistical methods for expressing the impact risk posed to space systems in general [and the International Space Station (ISS) in particular] by other resident space objects have been examined. One of the findings of this investigation is that there are legitimate physical modeling reasons for the common statistical expression of the collision risk. A combination of statistical methods and physical modeling is also used to express the impact risk posed by re-entering space systems to objects of interest (e.g., people and property) on Earth. One of the largest uncertainties in the expressing of this risk is the estimation of survivable material which survives reentry to impact Earth's surface. This point was recently demonstrated in dramatic fashion by the impact of an intact expendable launch vehicle (ELV) upper stage near a private residence in the continental United States. Since approximately half of the missions supporting ISS will utilize ELVs, it is appropriate to examine the methods used to estimate the amount and physical characteristics of ELV debris surviving reentry to impact Earth's surface. This paper examines reentry survivability estimation methodology, including the specific methodology used by Caiman Sciences' 'Survive' model. Comparison between empirical results (observations of objects which have been recovered on Earth after surviving reentry) and Survive estimates are presented for selected upper stage or spacecraft components and a Delta launch vehicle second stage.

  3. A new method for estimating the usual intake of episodically-consumed foods with application to their distribution

    PubMed Central

    Midthune, Douglas; Dodd, Kevin W.; Freedman, Laurence S.; Krebs-Smith, Susan M.; Subar, Amy F.; Guenther, Patricia M.; Carroll, Raymond J.; Kipnis, Victor

    2007-01-01

    Objective We propose a new statistical method that uses information from two 24-hour recalls (24HRs) to estimate usual intake of episodically-consumed foods. Statistical Analyses Performed The method developed at the National Cancer Institute (NCI) accommodates the large number of non-consumption days that arise with foods by separating the probability of consumption from the consumption-day amount, using a two-part model. Covariates, such as sex, age, race, or information from a food frequency questionnaire (FFQ), may supplement the information from two or more 24HRs using correlated mixed model regression. The model allows for correlation between the probability of consuming a food on a single day and the consumption-day amount. Percentiles of the distribution of usual intake are computed from the estimated model parameters. Results The Eating at America's Table Study (EATS) data are used to illustrate the method to estimate the distribution of usual intake for whole grains and dark green vegetables for men and women and the distribution of usual intakes of whole grains by educational level among men. A simulation study indicates that the NCI method leads to substantial improvement over existing methods for estimating the distribution of usual intake of foods. Applications/Conclusions The NCI method provides distinct advantages over previously proposed methods by accounting for the correlation between probability of consumption and amount consumed and by incorporating covariate information. Researchers interested in estimating the distribution of usual intakes of foods for a population or subpopulation are advised to work with a statistician and incorporate the NCI method in analyses. PMID:17000190

  4. A method for estimating current attendance on sets of campgrounds...a pilot study

    Treesearch

    Richard L. Bury; Ruth Margolies

    1964-01-01

    Statistical models were devised for estimating both daily and seasonal attendance (and corresponding precision of estimates) through correlation-regression and ratio analyses. Total daily attendance for a test set of 23 campgrounds could be estimated from attendance measured in only one of them. The chances were that estimates would be within 10 percent of true...

  5. A review of statistical methods to analyze extreme precipitation and temperature events in the Mediterranean region

    NASA Astrophysics Data System (ADS)

    Lazoglou, Georgia; Anagnostopoulou, Christina; Tolika, Konstantia; Kolyva-Machera, Fotini

    2018-04-01

    The increasing trend of the intensity and frequency of temperature and precipitation extremes during the past decades has substantial environmental and socioeconomic impacts. Thus, the objective of the present study is the comparison of several statistical methods of the extreme value theory (EVT) in order to identify which is the most appropriate to analyze the behavior of the extreme precipitation, and high and low temperature events, in the Mediterranean region. The extremes choice was made using both the block maxima and the peaks over threshold (POT) technique and as a consequence both the generalized extreme value (GEV) and generalized Pareto distributions (GPDs) were used to fit them. The results were compared, in order to select the most appropriate distribution for extremes characterization. Moreover, this study evaluates the maximum likelihood estimation, the L-moments and the Bayesian method, based on both graphical and statistical goodness-of-fit tests. It was revealed that the GPD can characterize accurately both precipitation and temperature extreme events. Additionally, GEV distribution with the Bayesian method is proven to be appropriate especially for the greatest values of extremes. Another important objective of this investigation was the estimation of the precipitation and temperature return levels for three return periods (50, 100, and 150 years) classifying the data into groups with similar characteristics. Finally, the return level values were estimated with both GEV and GPD and with the three different estimation methods, revealing that the selected method can affect the return level values for both the parameter of precipitation and temperature.

  6. SU-F-T-687: Comparison of SPECT/CT-Based Methodologies for Estimating Lung Dose from Y-90 Radioembolization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kost, S; Yu, N; Lin, S

    2016-06-15

    Purpose: To compare mean lung dose (MLD) estimates from 99mTc macroaggregated albumin (MAA) SPECT/CT using two published methodologies for patients treated with {sup 90}Y radioembolization for liver cancer. Methods: MLD was estimated retrospectively using two methodologies for 40 patients from SPECT/CT images of 99mTc-MAA administered prior to radioembolization. In these two methods, lung shunt fractions (LSFs) were calculated as the ratio of scanned lung activity to the activity in the entire scan volume or to the sum of activity in the lung and liver respectively. Misregistration of liver activity into the lungs during SPECT acquisition was overcome by excluding lungmore » counts within either 2 or 1.5 cm of the diaphragm apex respectively. Patient lung density was assumed to be 0.3 g/cm{sup 3} or derived from CT densitovolumetry respectively. Results from both approaches were compared to MLD determined by planar scintigraphy (PS). The effect of patient size on the difference between MLD from PS and SPECT/CT was also investigated. Results: Lung density from CT densitovolumetry is not different from the reference density (p = 0.68). The second method resulted in lung dose of an average 1.5 times larger lung dose compared to the first method; however the difference between the means of the two estimates was not significant (p = 0.07). Lung dose from both methods were statistically different from those estimated from 2D PS (p < 0.001). There was no correlation between patient size and the difference between MLD from PS and both SPECT/CT methods (r < 0.22, p > 0.17). Conclusion: There is no statistically significant difference between MLD estimated from the two techniques. Both methods are statistically different from conventional PS, with PS overestimating dose by a factor of three or larger. The difference between lung doses estimated from 2D planar or 3D SPECT/CT is not dependent on patient size.« less

  7. Artificial Neural Network Based Group Contribution Method for Estimating Cetane and Octane Numbers of Hydrocarbons and Oxygenated Organic Compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kubic, William Louis; Jenkins, Rhodri W.; Moore, Cameron M.

    Chemical pathways for converting biomass into fuels produce compounds for which key physical and chemical property data are unavailable. We developed an artificial neural network based group contribution method for estimating cetane and octane numbers that captures the complex dependence of fuel properties of pure compounds on chemical structure and is statistically superior to current methods.

  8. Artificial Neural Network Based Group Contribution Method for Estimating Cetane and Octane Numbers of Hydrocarbons and Oxygenated Organic Compounds

    DOE PAGES

    Kubic, William Louis; Jenkins, Rhodri W.; Moore, Cameron M.; ...

    2017-09-28

    Chemical pathways for converting biomass into fuels produce compounds for which key physical and chemical property data are unavailable. We developed an artificial neural network based group contribution method for estimating cetane and octane numbers that captures the complex dependence of fuel properties of pure compounds on chemical structure and is statistically superior to current methods.

  9. Application of Statistical Methods of Rain Rate Estimation to Data From The TRMM Precipitation Radar

    NASA Technical Reports Server (NTRS)

    Meneghini, R.; Jones, J. A.; Iguchi, T.; Okamoto, K.; Liao, L.; Busalacchi, Antonio J. (Technical Monitor)

    2000-01-01

    The TRMM Precipitation Radar is well suited to statistical methods in that the measurements over any given region are sparsely sampled in time. Moreover, the instantaneous rain rate estimates are often of limited accuracy at high rain rates because of attenuation effects and at light rain rates because of receiver sensitivity. For the estimation of the time-averaged rain characteristics over an area both errors are relevant. By enlarging the space-time region over which the data are collected, the sampling error can be reduced. However. the bias and distortion of the estimated rain distribution generally will remain if estimates at the high and low rain rates are not corrected. In this paper we use the TRMM PR data to investigate the behavior of 2 statistical methods the purpose of which is to estimate the rain rate over large space-time domains. Examination of large-scale rain characteristics provides a useful starting point. The high correlation between the mean and standard deviation of rain rate implies that the conditional distribution of this quantity can be approximated by a one-parameter distribution. This property is used to explore the behavior of the area-time-integral (ATI) methods where fractional area above a threshold is related to the mean rain rate. In the usual application of the ATI method a correlation is established between these quantities. However, if a particular form of the rain rate distribution is assumed and if the ratio of the mean to standard deviation is known, then not only the mean but the full distribution can be extracted from a measurement of fractional area above a threshold. The second method is an extension of this idea where the distribution is estimated from data over a range of rain rates chosen in an intermediate range where the effects of attenuation and poor sensitivity can be neglected. The advantage of estimating the distribution itself rather than the mean value is that it yields the fraction of rain contributed by the light and heavy rain rates. This is useful in estimating the fraction of rainfall contributed by the rain rates that go undetected by the radar. The results at high rain rates provide a cross-check on the usual attenuation correction methods that are applied at the highest resolution of the instrument.

  10. Combining heuristic and statistical techniques in landslide hazard assessments

    NASA Astrophysics Data System (ADS)

    Cepeda, Jose; Schwendtner, Barbara; Quan, Byron; Nadim, Farrokh; Diaz, Manuel; Molina, Giovanni

    2014-05-01

    As a contribution to the Global Assessment Report 2013 - GAR2013, coordinated by the United Nations International Strategy for Disaster Reduction - UNISDR, a drill-down exercise for landslide hazard assessment was carried out by entering the results of both heuristic and statistical techniques into a new but simple combination rule. The data available for this evaluation included landslide inventories, both historical and event-based. In addition to the application of a heuristic method used in the previous editions of GAR, the availability of inventories motivated the use of statistical methods. The heuristic technique is largely based on the Mora & Vahrson method, which estimates hazard as the product of susceptibility and triggering factors, where classes are weighted based on expert judgment and experience. Two statistical methods were also applied: the landslide index method, which estimates weights of the classes for the susceptibility and triggering factors based on the evidence provided by the density of landslides in each class of the factors; and the weights of evidence method, which extends the previous technique to include both positive and negative evidence of landslide occurrence in the estimation of weights for the classes. One key aspect during the hazard evaluation was the decision on the methodology to be chosen for the final assessment. Instead of opting for a single methodology, it was decided to combine the results of the three implemented techniques using a combination rule based on a normalization of the results of each method. The hazard evaluation was performed for both earthquake- and rainfall-induced landslides. The country chosen for the drill-down exercise was El Salvador. The results indicate that highest hazard levels are concentrated along the central volcanic chain and at the centre of the northern mountains.

  11. Three-dimensional holoscopic image coding scheme using high-efficiency video coding with kernel-based minimum mean-square-error estimation

    NASA Astrophysics Data System (ADS)

    Liu, Deyang; An, Ping; Ma, Ran; Yang, Chao; Shen, Liquan; Li, Kai

    2016-07-01

    Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.

  12. Incorporating spatial context into statistical classification of multidimensional image data

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.

    1981-01-01

    Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.

  13. Noise exposure-response relationships established from repeated binary observations: Modeling approaches and applications.

    PubMed

    Schäffer, Beat; Pieren, Reto; Mendolia, Franco; Basner, Mathias; Brink, Mark

    2017-05-01

    Noise exposure-response relationships are used to estimate the effects of noise on individuals or a population. Such relationships may be derived from independent or repeated binary observations, and modeled by different statistical methods. Depending on the method by which they were established, their application in population risk assessment or estimation of individual responses may yield different results, i.e., predict "weaker" or "stronger" effects. As far as the present body of literature on noise effect studies is concerned, however, the underlying statistical methodology to establish exposure-response relationships has not always been paid sufficient attention. This paper gives an overview on two statistical approaches (subject-specific and population-averaged logistic regression analysis) to establish noise exposure-response relationships from repeated binary observations, and their appropriate applications. The considerations are illustrated with data from three noise effect studies, estimating also the magnitude of differences in results when applying exposure-response relationships derived from the two statistical approaches. Depending on the underlying data set and the probability range of the binary variable it covers, the two approaches yield similar to very different results. The adequate choice of a specific statistical approach and its application in subsequent studies, both depending on the research question, are therefore crucial.

  14. Methods for estimating selected low-flow frequency statistics and harmonic mean flows for streams in Iowa

    USGS Publications Warehouse

    Eash, David A.; Barnes, Kimberlee K.

    2017-01-01

    A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.

  15. Unbiased estimation of oceanic mean rainfall from satellite borne radiometer measurements

    NASA Technical Reports Server (NTRS)

    Mittal, M. C.

    1981-01-01

    The statistical properties of the radar derived rainfall obtained during the GARP Atlantic Tropical Experiment (GATE) are used to derive quantitative estimates of the spatial and temporal sampling errors associated with estimating rainfall from brightness temperature measurements such as would be obtained from a satelliteborne microwave radiometer employing a practical size antenna aperture. A basis for a method of correcting the so called beam filling problem, i.e., for the effect of nonuniformity of rainfall over the radiometer beamwidth is provided. The method presented employs the statistical properties of the observations themselves without need for physical assumptions beyond those associated with the radiative transfer model. The simulation results presented offer a validation of the estimated accuracy that can be achieved and the graphs included permit evaluation of the effect of the antenna resolution on both the temporal and spatial sampling errors.

  16. Quantifying and reducing statistical uncertainty in sample-based health program costing studies in low- and middle-income countries

    PubMed Central

    Resch, Stephen

    2018-01-01

    Objectives: In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. Methods: We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. Results: A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of delivering established health programs but also to decrease uncertainty when the interest lies in assessing the incremental effect of an intervention. Conclusion: Measures of statistical uncertainty associated with survey-based estimates of program costs, such as standard errors and 95% confidence intervals, provide important contextual information for health policy decision-making and key inputs for the design of future costing studies. Such measures are often not reported, possibly because of technical challenges associated with their calculation and a lack of awareness of appropriate software. Modern statistical analysis methods for survey data, such as calibration, provide a means to exploit additional information that is readily available but was not used in the design of the study to significantly improve the estimation of total cost through the reduction of statistical uncertainty. PMID:29636964

  17. History by history statistical estimators in the BEAM code system.

    PubMed

    Walters, B R B; Kawrakow, I; Rogers, D W O

    2002-12-01

    A history by history method for estimating uncertainties has been implemented in the BEAMnrc and DOSXYznrc codes replacing the method of statistical batches. This method groups scored quantities (e.g., dose) by primary history. When phase-space sources are used, this method groups incident particles according to the primary histories that generated them. This necessitated adding markers (negative energy) to phase-space files to indicate the first particle generated by a new primary history. The new method greatly reduces the uncertainty in the uncertainty estimate. The new method eliminates one dimension (which kept the results for each batch) from all scoring arrays, resulting in memory requirement being decreased by a factor of 2. Correlations between particles in phase-space sources are taken into account. The only correlations with any significant impact on uncertainty are those introduced by particle recycling. Failure to account for these correlations can result in a significant underestimate of the uncertainty. The previous method of accounting for correlations due to recycling by placing all recycled particles in the same batch did work. Neither the new method nor the batch method take into account correlations between incident particles when a phase-space source is restarted so one must avoid restarts.

  18. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach

    PubMed Central

    Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.

    2014-01-01

    The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736

  19. Archaeology Through Computational Linguistics: Inscription Statistics Predict Excavation Sites of Indus Valley Artifacts.

    PubMed

    Recchia, Gabriel L; Louwerse, Max M

    2016-11-01

    Computational techniques comparing co-occurrences of city names in texts allow the relative longitudes and latitudes of cities to be estimated algorithmically. However, these techniques have not been applied to estimate the provenance of artifacts with unknown origins. Here, we estimate the geographic origin of artifacts from the Indus Valley Civilization, applying methods commonly used in cognitive science to the Indus script. We show that these methods can accurately predict the relative locations of archeological sites on the basis of artifacts of known provenance, and we further apply these techniques to determine the most probable excavation sites of four sealings of unknown provenance. These findings suggest that inscription statistics reflect historical interactions among locations in the Indus Valley region, and they illustrate how computational methods can help localize inscribed archeological artifacts of unknown origin. The success of this method offers opportunities for the cognitive sciences in general and for computational anthropology specifically. Copyright © 2015 Cognitive Science Society, Inc.

  20. Statistical parameters of random heterogeneity estimated by analysing coda waves based on finite difference method

    NASA Astrophysics Data System (ADS)

    Emoto, K.; Saito, T.; Shiomi, K.

    2017-12-01

    Short-period (<1 s) seismograms are strongly affected by small-scale (<10 km) heterogeneities in the lithosphere. In general, short-period seismograms are analysed based on the statistical method by considering the interaction between seismic waves and randomly distributed small-scale heterogeneities. Statistical properties of the random heterogeneities have been estimated by analysing short-period seismograms. However, generally, the small-scale random heterogeneity is not taken into account for the modelling of long-period (>2 s) seismograms. We found that the energy of the coda of long-period seismograms shows a spatially flat distribution. This phenomenon is well known in short-period seismograms and results from the scattering by small-scale heterogeneities. We estimate the statistical parameters that characterize the small-scale random heterogeneity by modelling the spatiotemporal energy distribution of long-period seismograms. We analyse three moderate-size earthquakes that occurred in southwest Japan. We calculate the spatial distribution of the energy density recorded by a dense seismograph network in Japan at the period bands of 8-16 s, 4-8 s and 2-4 s and model them by using 3-D finite difference (FD) simulations. Compared to conventional methods based on statistical theories, we can calculate more realistic synthetics by using the FD simulation. It is not necessary to assume a uniform background velocity, body or surface waves and scattering properties considered in general scattering theories. By taking the ratio of the energy of the coda area to that of the entire area, we can separately estimate the scattering and the intrinsic absorption effects. Our result reveals the spectrum of the random inhomogeneity in a wide wavenumber range including the intensity around the corner wavenumber as P(m) = 8πε2a3/(1 + a2m2)2, where ε = 0.05 and a = 3.1 km, even though past studies analysing higher-frequency records could not detect the corner. Finally, we estimate the intrinsic attenuation by modelling the decay rate of the energy. The method proposed in this study is suitable for quantifying the statistical properties of long-wavelength subsurface random inhomogeneity, which leads the way to characterizing a wider wavenumber range of spectra, including the corner wavenumber.

  1. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

    PubMed

    Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

    2016-12-20

    Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. A Bayesian approach to the statistical analysis of device preference studies.

    PubMed

    Fu, Haoda; Qu, Yongming; Zhu, Baojin; Huster, William

    2012-01-01

    Drug delivery devices are required to have excellent technical specifications to deliver drugs accurately, and in addition, the devices should provide a satisfactory experience to patients because this can have a direct effect on drug compliance. To compare patients' experience with two devices, cross-over studies with patient-reported outcomes (PRO) as response variables are often used. Because of the strength of cross-over designs, each subject can directly compare the two devices by using the PRO variables, and variables indicating preference (preferring A, preferring B, or no preference) can be easily derived. Traditionally, methods based on frequentist statistics can be used to analyze such preference data, but there are some limitations for the frequentist methods. Recently, Bayesian methods are considered an acceptable method by the US Food and Drug Administration to design and analyze device studies. In this paper, we propose a Bayesian statistical method to analyze the data from preference trials. We demonstrate that the new Bayesian estimator enjoys some optimal properties versus the frequentist estimator. Copyright © 2012 John Wiley & Sons, Ltd.

  3. Conditional maximum-entropy method for selecting prior distributions in Bayesian statistics

    NASA Astrophysics Data System (ADS)

    Abe, Sumiyoshi

    2014-11-01

    The conditional maximum-entropy method (abbreviated here as C-MaxEnt) is formulated for selecting prior probability distributions in Bayesian statistics for parameter estimation. This method is inspired by a statistical-mechanical approach to systems governed by dynamics with largely separated time scales and is based on three key concepts: conjugate pairs of variables, dimensionless integration measures with coarse-graining factors and partial maximization of the joint entropy. The method enables one to calculate a prior purely from a likelihood in a simple way. It is shown, in particular, how it not only yields Jeffreys's rules but also reveals new structures hidden behind them.

  4. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  5. Kappa statistic for clustered dichotomous responses from physicians and patients.

    PubMed

    Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L; Cai, Jianwen

    2013-09-20

    The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared with the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. We present an example of an application to a coronary heart disease prevention study. Copyright © 2013 John Wiley & Sons, Ltd.

  6. Estimating population diversity with CatchAll

    PubMed Central

    Bunge, John; Woodard, Linda; Böhning, Dankmar; Foster, James A.; Connolly, Sean; Allen, Heather K.

    2012-01-01

    Motivation: The massive data produced by next-generation sequencing require advanced statistical tools. We address estimating the total diversity or species richness in a population. To date, only relatively simple methods have been implemented in available software. There is a need for software employing modern, computationally intensive statistical analyses including error, goodness-of-fit and robustness assessments. Results: We present CatchAll, a fast, easy-to-use, platform-independent program that computes maximum likelihood estimates for finite-mixture models, weighted linear regression-based analyses and coverage-based non-parametric methods, along with outlier diagnostics. Given sample ‘frequency count’ data, CatchAll computes 12 different diversity estimates and applies a model-selection algorithm. CatchAll also derives discounted diversity estimates to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphics program. Availability: Free executable downloads for Linux, Windows and Mac OS, with manual and source code, at www.northeastern.edu/catchall. Contact: jab18@cornell.edu PMID:22333246

  7. Distribution of the two-sample t-test statistic following blinded sample size re-estimation.

    PubMed

    Lu, Kaifeng

    2016-05-01

    We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky

    USGS Publications Warehouse

    Martin, Gary R.; Arihood, Leslie D.

    2010-01-01

    This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.

  9. An introduction to Bayesian statistics in health psychology.

    PubMed

    Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

    2017-09-01

    The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.

  10. A hybrid method in combining treatment effects from matched and unmatched studies.

    PubMed

    Byun, Jinyoung; Lai, Dejian; Luo, Sheng; Risser, Jan; Tung, Betty; Hardy, Robert J

    2013-12-10

    The most common data structures in the biomedical studies have been matched or unmatched designs. Data structures resulting from a hybrid of the two may create challenges for statistical inferences. The question may arise whether to use parametric or nonparametric methods on the hybrid data structure. The Early Treatment for Retinopathy of Prematurity study was a multicenter clinical trial sponsored by the National Eye Institute. The design produced data requiring a statistical method of a hybrid nature. An infant in this multicenter randomized clinical trial had high-risk prethreshold retinopathy of prematurity that was eligible for treatment in one or both eyes at entry into the trial. During follow-up, recognition visual acuity was accessed for both eyes. Data from both eyes (matched) and from only one eye (unmatched) were eligible to be used in the trial. The new hybrid nonparametric method is a meta-analysis based on combining the Hodges-Lehmann estimates of treatment effects from the Wilcoxon signed rank and rank sum tests. To compare the new method, we used the classic meta-analysis with the t-test method to combine estimates of treatment effects from the paired and two sample t-tests. We used simulations to calculate the empirical size and power of the test statistics, as well as the bias, mean square and confidence interval width of the corresponding estimators. The proposed method provides an effective tool to evaluate data from clinical trials and similar comparative studies. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Statistical methods of estimating mining costs

    USGS Publications Warehouse

    Long, K.R.

    2011-01-01

    Until it was defunded in 1995, the U.S. Bureau of Mines maintained a Cost Estimating System (CES) for prefeasibility-type economic evaluations of mineral deposits and estimating costs at producing and non-producing mines. This system had a significant role in mineral resource assessments to estimate costs of developing and operating known mineral deposits and predicted undiscovered deposits. For legal reasons, the U.S. Geological Survey cannot update and maintain CES. Instead, statistical tools are under development to estimate mining costs from basic properties of mineral deposits such as tonnage, grade, mineralogy, depth, strip ratio, distance from infrastructure, rock strength, and work index. The first step was to reestimate "Taylor's Rule" which relates operating rate to available ore tonnage. The second step was to estimate statistical models of capital and operating costs for open pit porphyry copper mines with flotation concentrators. For a sample of 27 proposed porphyry copper projects, capital costs can be estimated from three variables: mineral processing rate, strip ratio, and distance from nearest railroad before mine construction began. Of all the variables tested, operating costs were found to be significantly correlated only with strip ratio.

  12. Using generalized estimating equations and extensions in randomized trials with missing longitudinal patient reported outcome data.

    PubMed

    Bell, Melanie L; Horton, Nicholas J; Dhillon, Haryana M; Bray, Victoria J; Vardy, Janette

    2018-05-26

    Patient reported outcomes (PROs) are important in oncology research; however, missing data can pose a threat to the validity of results. Psycho-oncology researchers should be aware of the statistical options for handling missing data robustly. One rarely used set of methods, which includes extensions for handling missing data, is generalized estimating equations (GEEs). Our objective was to demonstrate use of GEEs to analyze PROs with missing data in randomized trials with assessments at fixed time points. We introduce GEEs and show, with a worked example, how to use GEEs that account for missing data: inverse probability weighted GEEs and multiple imputation with GEE. We use data from an RCT evaluating a web-based brain training for cancer survivors reporting cognitive symptoms after chemotherapy treatment. The primary outcome for this demonstration is the binary outcome of cognitive impairment. Several methods are used, and results are compared. We demonstrate that estimates can vary depending on the choice of analytical approach, with odds ratios for no cognitive impairment ranging from 2.04 to 5.74. While most of these estimates were statistically significant (P < 0.05), a few were not. Researchers using PROs should use statistical methods that handle missing data in a way as to result in unbiased estimates. GEE extensions are analytic options for handling dropouts in longitudinal RCTs, particularly if the outcome is not continuous. Copyright © 2018 John Wiley & Sons, Ltd.

  13. A novel method of estimation of lipophilicity using distance-based topological indices: dominating role of equalized electronegativity.

    PubMed

    Agrawal, Vijay K; Gupta, Madhu; Singh, Jyoti; Khadikar, Padmakar V

    2005-03-15

    Attempt is made to propose yet another method of estimating lipophilicity of a heterogeneous set of 223 compounds. The method is based on the use of equalized electronegativity along with topological indices. It was observed that excellent results are obtained in multiparametric regression upon introduction of indicator parameters. The results are discussed critically on the basis various statistical parameters.

  14. Reconstructing Spectral Scenes Using Statistical Estimation to Enhance Space Situational Awareness

    DTIC Science & Technology

    2006-12-01

    simultane- ously spatially and spectrally deblur the images collected from ASIS. The algorithms are based on proven estimation theories and do not...collected with any system using a filtering technology known as Electronic Tunable Filters (ETFs). Previous methods to deblur spectral images collected...spectrally deblurring then the previously investigated methods. This algorithm expands on a method used for increasing the spectral resolution in gamma-ray

  15. Causes of Death Data in the Global Burden of Disease Estimates for Ischemic and Hemorrhagic Stroke

    PubMed Central

    Truelsen, Thomas; Krarup, Lars-Henrik; Iversen, Helle; Mensah, George A.; Feigin, Valery; Sposato, Luciano; Naghavi, Mohsen

    2015-01-01

    Background Stroke mortality estimates in the Global Burden of Disease (GBD) study are based on routine mortality statistics and redistribution of ill-defined codes that cannot be a cause of death, the so-called “garbage codes”. This study describes the contribution of these codes to stroke mortality estimates. Methods All available mortality data were compiled and non-specific cause codes were redistributed based on literature review and statistical methods. Ill-defined codes were redistributed to their specific cause of disease by age, sex, country, and year. The reassignment was done based on the international classification of diseases and the pathology behind each code by checking multiple causes of death and literature review. Results Unspecified stroke, and primary and secondary hypertension are leading contributing “garbage codes” to stroke mortality estimates for intracranial hemorrhagic stroke and ischemic stroke. There were marked differences in the fraction of death assigned to ischemic stroke and hemorrhagic stroke for unspecified stroke and hypertension between GBD regions and between age groups. Conclusions A large proportion of stroke fatalities is derived from the redistribution of “unspecified stroke” and “hypertension” with marked regional differences. Future advancements in stroke certification, data collections, and statistical analyses may improve the estimation of the global stroke burden. PMID:26505189

  16. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  17. Performance assessment of methods for estimation of fractal dimension from scanning electron microscope images.

    PubMed

    Risović, Dubravko; Pavlović, Zivko

    2013-01-01

    Processing of gray scale images in order to determine the corresponding fractal dimension is very important due to widespread use of imaging technologies and application of fractal analysis in many areas of science, technology, and medicine. To this end, many methods for estimation of fractal dimension from gray scale images have been developed and routinely used. Unfortunately different methods (dimension estimators) often yield significantly different results in a manner that makes interpretation difficult. Here, we report results of comparative assessment of performance of several most frequently used algorithms/methods for estimation of fractal dimension. To that purpose, we have used scanning electron microscope images of aluminum oxide surfaces with different fractal dimensions. The performance of algorithms/methods was evaluated using the statistical Z-score approach. The differences between performances of six various methods are discussed and further compared with results obtained by electrochemical impedance spectroscopy on the same samples. The analysis of results shows that the performance of investigated algorithms varies considerably and that systematically erroneous fractal dimensions could be estimated using certain methods. The differential cube counting, triangulation, and box counting algorithms showed satisfactory performance in the whole investigated range of fractal dimensions. Difference statistic is proved to be less reliable generating 4% of unsatisfactory results. The performances of the Power spectrum, Partitioning and EIS were unsatisfactory in 29%, 38%, and 75% of estimations, respectively. The results of this study should be useful and provide guidelines to researchers using/attempting fractal analysis of images obtained by scanning microscopy or atomic force microscopy. © Wiley Periodicals, Inc.

  18. Sample size and power considerations in network meta-analysis

    PubMed Central

    2012-01-01

    Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327

  19. Magnetic resonance spectra and statistical geometry

    USDA-ARS?s Scientific Manuscript database

    Methods of statistical geometry are introduced which allow one to estimate, on the basis of computable criteria, the conditions under which maximally informative data may be collected. We note the important role of constraints that introduce curvature into parameter space and discuss the appropriate...

  20. Estimating the expected value of partial perfect information in health economic evaluations using integrated nested Laplace approximation.

    PubMed

    Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca

    2016-10-15

    The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the 'cost' of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  1. Empirical Bayes methods for smoothing data and for simultaneous estimation of many parameters.

    PubMed Central

    Yanagimoto, T; Kashiwagi, N

    1990-01-01

    A recent successful development is found in a series of innovative, new statistical methods for smoothing data that are based on the empirical Bayes method. This paper emphasizes their practical usefulness in medical sciences and their theoretically close relationship with the problem of simultaneous estimation of parameters, depending on strata. The paper also presents two examples of analyzing epidemiological data obtained in Japan using the smoothing methods to illustrate their favorable performance. PMID:2148512

  2. Global, Regional, and National Fossil-Fuel CO2 Emissions, 1751 - 2008 (Version 2011)

    DOE Data Explorer

    Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Marland, G. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory

    2011-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2010), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2010) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  3. Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2010) (V. 2013)

    DOE Data Explorer

    Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory; Marland, G.

    2013-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2013), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2012) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  4. Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2014) (V. 2017)

    DOE Data Explorer

    Boden, T. A. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Andres, R. J. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Marland, G. [Appalachian State University, Boone, NC (USA)

    2017-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2017), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2017) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  5. Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2013) (V. 2016)

    DOE Data Explorer

    Boden, T. A. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Andres, R. J. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Marland, G. [Appalachian State University, Boone, NC (USA)

    2016-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2016), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2016) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  6. Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2011) (V. 2015)

    DOE Data Explorer

    Boden, T. A. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Andres, R. J. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Marland, G. [Appalachian State University Boone, NC (USA)

    2015-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2014), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2014) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  7. Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2009) (V. 2012)

    DOE Data Explorer

    Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [Oak Ridge National Laboratory; Marland, G. [Research Institute for Environment, Energy and Economics, Appalachian State University

    2012-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2012), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2011) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  8. Global, Regional, and National Fossil-Fuel CO2 Emissions, 1751 - 2007 (Version 2010)

    DOE Data Explorer

    Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Marland, G. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory

    2010-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2009), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2009) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  9. Global, Regional, and National Fossil-Fuel CO2 Emissions, 1751 - 2006 (published 2009)

    DOE Data Explorer

    Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Marland, G. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory

    2009-01-01

    Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2008), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2008) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).

  10. Point and interval estimation of pollinator importance: a study using pollination data of Silene caroliniana.

    PubMed

    Reynolds, Richard J; Fenster, Charles B

    2008-05-01

    Pollinator importance, the product of visitation rate and pollinator effectiveness, is a descriptive parameter of the ecology and evolution of plant-pollinator interactions. Naturally, sources of its variation should be investigated, but the SE of pollinator importance has never been properly reported. Here, a Monte Carlo simulation study and a result from mathematical statistics on the variance of the product of two random variables are used to estimate the mean and confidence limits of pollinator importance for three visitor species of the wildflower, Silene caroliniana. Both methods provided similar estimates of mean pollinator importance and its interval if the sample size of the visitation and effectiveness datasets were comparatively large. These approaches allowed us to determine that bumblebee importance was significantly greater than clearwing hawkmoth, which was significantly greater than beefly. The methods could be used to statistically quantify temporal and spatial variation in pollinator importance of particular visitor species. The approaches may be extended for estimating the variance of more than two random variables. However, unless the distribution function of the resulting statistic is known, the simulation approach is preferable for calculating the parameter's confidence limits.

  11. A simple method for processing data with least square method

    NASA Astrophysics Data System (ADS)

    Wang, Chunyan; Qi, Liqun; Chen, Yongxiang; Pang, Guangning

    2017-08-01

    The least square method is widely used in data processing and error estimation. The mathematical method has become an essential technique for parameter estimation, data processing, regression analysis and experimental data fitting, and has become a criterion tool for statistical inference. In measurement data analysis, the distribution of complex rules is usually based on the least square principle, i.e., the use of matrix to solve the final estimate and to improve its accuracy. In this paper, a new method is presented for the solution of the method which is based on algebraic computation and is relatively straightforward and easy to understand. The practicability of this method is described by a concrete example.

  12. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.

    PubMed

    Austin, Peter C

    2007-11-01

    I conducted a systematic review of the use of propensity score matching in the cardiovascular surgery literature. I examined the adequacy of reporting and whether appropriate statistical methods were used. I examined 60 articles published in the Annals of Thoracic Surgery, European Journal of Cardio-thoracic Surgery, Journal of Cardiovascular Surgery, and the Journal of Thoracic and Cardiovascular Surgery between January 1, 2004, and December 31, 2006. Thirty-one of the 60 studies did not provide adequate information on how the propensity score-matched pairs were formed. Eleven (18%) of studies did not report on whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. No studies used appropriate methods to compare baseline characteristics between treated and untreated subjects in the propensity score-matched sample. Eight (13%) of the 60 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Two studies used appropriate methods for some outcomes, but not for all outcomes. Thirty-nine (65%) studies explicitly used statistical methods that were inappropriate for matched-pairs data when estimating the effect of treatment on outcomes. Eleven studies did not report the statistical tests that were used to assess the statistical significance of the treatment effect. Analysis of propensity score-matched samples tended to be poor in the cardiovascular surgery literature. Most statistical analyses ignored the matched nature of the sample. I provide suggestions for improving the reporting and analysis of studies that use propensity score matching.

  13. A Tool for Estimating Variability in Wood Preservative Treatment Retention

    Treesearch

    Patricia K. Lebow; Adam M. Taylor; Timothy M. Young

    2015-01-01

    Composite sampling is standard practice for evaluation of preservative retention levels in preservative-treated wood. Current protocols provide an average retention value but no estimate of uncertainty. Here we describe a statistical method for calculating uncertainty estimates using the standard sampling regime with minimal additional chemical analysis. This tool can...

  14. Estimation of census and effective population sizes: the increasing usefulness of DNA-based approaches

    Treesearch

    Gordon Luikart; Nils Ryman; David A. Tallmon; Michael K. Schwartz; Fred W. Allendorf

    2010-01-01

    Population census size (NC) and effective population sizes (Ne) are two crucial parameters that influence population viability, wildlife management decisions, and conservation planning. Genetic estimators of both NC and Ne are increasingly widely used because molecular markers are increasingly available, statistical methods are improving rapidly, and genetic estimators...

  15. Adjusting for radiotelemetry error to improve estimates of habitat use.

    Treesearch

    Scott L. Findholt; Bruce K. Johnson; Lyman L. McDonald; John W. Kern; Alan Ager; Rosemary J. Stussy; Larry D. Bryant

    2002-01-01

    Animal locations estimated from radiotelemetry have traditionally been treated as error-free when analyzed in relation to habitat variables. Location error lowers the power of statistical tests of habitat selection. We describe a method that incorporates the error surrounding point estimates into measures of environmental variables determined from a geographic...

  16. Bridging national and reference definitions for harmonizing forest statistics

    Treesearch

    Göran ​Ståhl; Emil Cienciala; Gherardo Chirici; Adrian Lanz; Claude Vidal; Susanne Winter; Ronald E. McRoberts; Jacques Rondeux; Klemens Schadauer; Erkki Tomppo

    2012-01-01

    Harmonization is the process of making information and estimates comparable across administrative borders. The degree to which harmonization succeeds depends on many factors, including the conciseness of the definitions, the availability and quality of data, and the methods used to convert an estimate according to a local definition to an estimate according to the...

  17. Three statistical models for estimating length of stay.

    PubMed Central

    Selvin, S

    1977-01-01

    The probability density functions implied by three methods of collecting data on the length of stay in an institution are derived. The expected values associated with these density functions are used to calculate unbiased estimates of the expected length of stay. Two of the methods require an assumption about the form of the underlying distribution of length of stay; the third method does not. The three methods are illustrated with hypothetical data exhibiting the Poisson distribution, and the third (distribution-independent) method is used to estimate the length of stay in a skilled nursing facility and in an intermediate care facility for patients enrolled in California's MediCal program. PMID:914532

  18. Three statistical models for estimating length of stay.

    PubMed

    Selvin, S

    1977-01-01

    The probability density functions implied by three methods of collecting data on the length of stay in an institution are derived. The expected values associated with these density functions are used to calculate unbiased estimates of the expected length of stay. Two of the methods require an assumption about the form of the underlying distribution of length of stay; the third method does not. The three methods are illustrated with hypothetical data exhibiting the Poisson distribution, and the third (distribution-independent) method is used to estimate the length of stay in a skilled nursing facility and in an intermediate care facility for patients enrolled in California's MediCal program.

  19. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  20. Minimum follow-up time required for the estimation of statistical cure of cancer patients: verification using data from 42 cancer sites in the SEER database

    PubMed Central

    Tai, Patricia; Yu, Edward; Cserni, Gábor; Vlastos, Georges; Royce, Melanie; Kunkler, Ian; Vinh-Hung, Vincent

    2005-01-01

    Background The present commonly used five-year survival rates are not adequate to represent the statistical cure. In the present study, we established the minimum number of years required for follow-up to estimate statistical cure rate, by using a lognormal distribution of the survival time of those who died of their cancer. We introduced the term, threshold year, the follow-up time for patients dying from the specific cancer covers most of the survival data, leaving less than 2.25% uncovered. This is close enough to cure from that specific cancer. Methods Data from the Surveillance, Epidemiology and End Results (SEER) database were tested if the survival times of cancer patients who died of their disease followed the lognormal distribution using a minimum chi-square method. Patients diagnosed from 1973–1992 in the registries of Connecticut and Detroit were chosen so that a maximum of 27 years was allowed for follow-up to 1999. A total of 49 specific organ sites were tested. The parameters of those lognormal distributions were found for each cancer site. The cancer-specific survival rates at the threshold years were compared with the longest available Kaplan-Meier survival estimates. Results The characteristics of the cancer-specific survival times of cancer patients who died of their disease from 42 cancer sites out of 49 sites were verified to follow different lognormal distributions. The threshold years validated for statistical cure varied for different cancer sites, from 2.6 years for pancreas cancer to 25.2 years for cancer of salivary gland. At the threshold year, the statistical cure rates estimated for 40 cancer sites were found to match the actuarial long-term survival rates estimated by the Kaplan-Meier method within six percentage points. For two cancer sites: breast and thyroid, the threshold years were so long that the cancer-specific survival rates could yet not be obtained because the SEER data do not provide sufficiently long follow-up. Conclusion The present study suggests a certain threshold year is required to wait before the statistical cure rate can be estimated for each cancer site. For some cancers, such as breast and thyroid, the 5- or 10-year survival rates inadequately reflect statistical cure rates, and highlight the need for long-term follow-up of these patients. PMID:15904508

  1. Bayesian approach to estimate AUC, partition coefficient and drug targeting index for studies with serial sacrifice design.

    PubMed

    Wang, Tianli; Baron, Kyle; Zhong, Wei; Brundage, Richard; Elmquist, William

    2014-03-01

    The current study presents a Bayesian approach to non-compartmental analysis (NCA), which provides the accurate and precise estimate of AUC 0 (∞) and any AUC 0 (∞) -based NCA parameter or derivation. In order to assess the performance of the proposed method, 1,000 simulated datasets were generated in different scenarios. A Bayesian method was used to estimate the tissue and plasma AUC 0 (∞) s and the tissue-to-plasma AUC 0 (∞) ratio. The posterior medians and the coverage of 95% credible intervals for the true parameter values were examined. The method was applied to laboratory data from a mice brain distribution study with serial sacrifice design for illustration. Bayesian NCA approach is accurate and precise in point estimation of the AUC 0 (∞) and the partition coefficient under a serial sacrifice design. It also provides a consistently good variance estimate, even considering the variability of the data and the physiological structure of the pharmacokinetic model. The application in the case study obtained a physiologically reasonable posterior distribution of AUC, with a posterior median close to the value estimated by classic Bailer-type methods. This Bayesian NCA approach for sparse data analysis provides statistical inference on the variability of AUC 0 (∞) -based parameters such as partition coefficient and drug targeting index, so that the comparison of these parameters following destructive sampling becomes statistically feasible.

  2. A new statistical method for design and analyses of component tolerance

    NASA Astrophysics Data System (ADS)

    Movahedi, Mohammad Mehdi; Khounsiavash, Mohsen; Otadi, Mahmood; Mosleh, Maryam

    2017-03-01

    Tolerancing conducted by design engineers to meet customers' needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not something new, engineers often use known distributions, including the normal distribution. Yet, if the statistical distribution of the given variable is unknown, a new statistical method will be employed to design tolerance. In this paper, we use generalized lambda distribution for design and analyses component tolerance. We use percentile method (PM) to estimate the distribution parameters. The findings indicated that, when the distribution of the component data is unknown, the proposed method can be used to expedite the design of component tolerance. Moreover, in the case of assembled sets, more extensive tolerance for each component with the same target performance can be utilized.

  3. Statistics of some atmospheric turbulence records relevant to aircraft response calculations

    NASA Technical Reports Server (NTRS)

    Mark, W. D.; Fischer, R. W.

    1981-01-01

    Methods for characterizing atmospheric turbulence are described. The methods illustrated include maximum likelihood estimation of the integral scale and intensity of records obeying the von Karman transverse power spectral form, constrained least-squares estimation of the parameters of a parametric representation of autocorrelation functions, estimation of the power spectra density of the instantaneous variance of a record with temporally fluctuating variance, and estimation of the probability density functions of various turbulence components. Descriptions of the computer programs used in the computations are given, and a full listing of these programs is included.

  4. Dimension from covariance matrices.

    PubMed

    Carroll, T L; Byers, J M

    2017-02-01

    We describe a method to estimate embedding dimension from a time series. This method includes an estimate of the probability that the dimension estimate is valid. Such validity estimates are not common in algorithms for calculating the properties of dynamical systems. The algorithm described here compares the eigenvalues of covariance matrices created from an embedded signal to the eigenvalues for a covariance matrix of a Gaussian random process with the same dimension and number of points. A statistical test gives the probability that the eigenvalues for the embedded signal did not come from the Gaussian random process.

  5. Conducting Simulation Studies in the R Programming Environment.

    PubMed

    Hallgren, Kevin A

    2013-10-12

    Simulation studies allow researchers to answer specific questions about data analysis, statistical power, and best-practices for obtaining accurate results in empirical research. Despite the benefits that simulation research can provide, many researchers are unfamiliar with available tools for conducting their own simulation studies. The use of simulation studies need not be restricted to researchers with advanced skills in statistics and computer programming, and such methods can be implemented by researchers with a variety of abilities and interests. The present paper provides an introduction to methods used for running simulation studies using the R statistical programming environment and is written for individuals with minimal experience running simulation studies or using R. The paper describes the rationale and benefits of using simulations and introduces R functions relevant for many simulation studies. Three examples illustrate different applications for simulation studies, including (a) the use of simulations to answer a novel question about statistical analysis, (b) the use of simulations to estimate statistical power, and (c) the use of simulations to obtain confidence intervals of parameter estimates through bootstrapping. Results and fully annotated syntax from these examples are provided.

  6. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments

    PubMed Central

    2010-01-01

    Background The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Results Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Conclusions Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/. PMID:20482791

  7. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments.

    PubMed

    Ma, Jingming; Dykes, Carrie; Wu, Tao; Huang, Yangxin; Demeter, Lisa; Wu, Hulin

    2010-05-18

    The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/.

  8. Uncertainties of flood frequency estimation approaches based on continuous simulation using data resampling

    NASA Astrophysics Data System (ADS)

    Arnaud, Patrick; Cantet, Philippe; Odry, Jean

    2017-11-01

    Flood frequency analyses (FFAs) are needed for flood risk management. Many methods exist ranging from classical purely statistical approaches to more complex approaches based on process simulation. The results of these methods are associated with uncertainties that are sometimes difficult to estimate due to the complexity of the approaches or the number of parameters, especially for process simulation. This is the case of the simulation-based FFA approach called SHYREG presented in this paper, in which a rainfall generator is coupled with a simple rainfall-runoff model in an attempt to estimate the uncertainties due to the estimation of the seven parameters needed to estimate flood frequencies. The six parameters of the rainfall generator are mean values, so their theoretical distribution is known and can be used to estimate the generator uncertainties. In contrast, the theoretical distribution of the single hydrological model parameter is unknown; consequently, a bootstrap method is applied to estimate the calibration uncertainties. The propagation of uncertainty from the rainfall generator to the hydrological model is also taken into account. This method is applied to 1112 basins throughout France. Uncertainties coming from the SHYREG method and from purely statistical approaches are compared, and the results are discussed according to the length of the recorded observations, basin size and basin location. Uncertainties of the SHYREG method decrease as the basin size increases or as the length of the recorded flow increases. Moreover, the results show that the confidence intervals of the SHYREG method are relatively small despite the complexity of the method and the number of parameters (seven). This is due to the stability of the parameters and takes into account the dependence of uncertainties due to the rainfall model and the hydrological calibration. Indeed, the uncertainties on the flow quantiles are on the same order of magnitude as those associated with the use of a statistical law with two parameters (here generalised extreme value Type I distribution) and clearly lower than those associated with the use of a three-parameter law (here generalised extreme value Type II distribution). For extreme flood quantiles, the uncertainties are mostly due to the rainfall generator because of the progressive saturation of the hydrological model.

  9. A call to improve methods for estimating tree biomass for regional and national assessments

    Treesearch

    Aaron R. Weiskittel; David W. MacFarlane; Philip J. Radtke; David L.R. Affleck; Hailemariam Temesgen; Christopher W. Woodall; James A. Westfall; John W. Coulston

    2015-01-01

    Tree biomass is typically estimated using statistical models. This review highlights five limitations of most tree biomass models, which include the following: (1) biomass data are costly to collect and alternative sampling methods are used; (2) belowground data and models are generally lacking; (3) models are often developed from small and geographically limited data...

  10. A model of distributed phase aberration for deblurring phase estimated from scattering.

    PubMed

    Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C

    2010-01-01

    Correction of aberration in ultrasound imaging uses the response of a point reflector or its equivalent to characterize the aberration. Because a point reflector is usually unavailable, its equivalent is obtained using statistical methods, such as processing reflections from multiple focal regions in a random medium. However, the validity of methods that use reflections from multiple points is limited to isoplanatic patches for which the aberration is essentially the same. In this study, aberration is modeled by an offset phase screen to relax the isoplanatic restriction. Methods are developed to determine the depth and phase of the screen and to use the model for compensation of aberration as the beam is steered. Use of the model to enhance the performance of the noted statistical estimation procedure is also described. Experimental results obtained with tissue-mimicking phantoms that implement different models and produce different amounts of aberration are presented to show the efficacy of these methods. The improvement in b-scan resolution realized with the model is illustrated. The results show that the isoplanatic patch assumption for estimation of aberration can be relaxed and that propagation-path characteristics and aberration estimation are closely related.

  11. Loss Factor Estimation Using the Impulse Response Decay Method on a Stiffened Structure

    NASA Technical Reports Server (NTRS)

    Cabell, Randolph; Schiller, Noah; Allen, Albert; Moeller, Mark

    2009-01-01

    High-frequency vibroacoustic modeling is typically performed using energy-based techniques such as Statistical Energy Analysis (SEA). Energy models require an estimate of the internal damping loss factor. Unfortunately, the loss factor is difficult to estimate analytically, and experimental methods such as the power injection method can require extensive measurements over the structure of interest. This paper discusses the implications of estimating damping loss factors using the impulse response decay method (IRDM) from a limited set of response measurements. An automated procedure for implementing IRDM is described and then evaluated using data from a finite element model of a stiffened, curved panel. Estimated loss factors are compared with loss factors computed using a power injection method and a manual curve fit. The paper discusses the sensitivity of the IRDM loss factor estimates to damping of connected subsystems and the number and location of points in the measurement ensemble.

  12. Testing an automated method to estimate ground-water recharge from streamflow records

    USGS Publications Warehouse

    Rutledge, A.T.; Daniel, C.C.

    1994-01-01

    The computer program, RORA, allows automated analysis of streamflow hydrographs to estimate ground-water recharge. Output from the program, which is based on the recession-curve-displacement method (often referred to as the Rorabaugh method, for whom the program is named), was compared to estimates of recharge obtained from a manual analysis of 156 years of streamflow record from 15 streamflow-gaging stations in the eastern United States. Statistical tests showed that there was no significant difference between paired estimates of annual recharge by the two methods. Tests of results produced by the four workers who performed the manual method showed that results can differ significantly between workers. Twenty-two percent of the variation between manual and automated estimates could be attributed to having different workers perform the manual method. The program RORA will produce estimates of recharge equivalent to estimates produced manually, greatly increase the speed od analysis, and reduce the subjectivity inherent in manual analysis.

  13. Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

    NASA Astrophysics Data System (ADS)

    Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

    2018-07-01

    A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.

  14. StreamStats: a U.S. geological survey web site for stream information

    USGS Publications Warehouse

    Kernell, G. Ries; Gray, John R.; Renard, Kenneth G.; McElroy, Stephen A.; Gburek, William J.; Canfield, H. Evan; Scott, Russell L.

    2003-01-01

    The U.S. Geological Survey has developed a Web application, named StreamStats, for providing streamflow statistics, such as the 100-year flood and the 7-day, 10-year low flow, to the public. Statistics can be obtained for data-collection stations and for ungaged sites. Streamflow statistics are needed for water-resources planning and management; for design of bridges, culverts, and flood-control structures; and for many other purposes. StreamStats users can point and click on data-collection stations shown on a map in their Web browser window to obtain previously determined streamflow statistics and other information for the stations. Users also can point and click on any stream shown on the map to get estimates of streamflow statistics for ungaged sites. StreamStats determines the watershed boundaries and measures physical and climatic characteristics of the watersheds for the ungaged sites by use of a Geographic Information System (GIS), and then it inserts the characteristics into previously determined regression equations to estimate the streamflow statistics. Compared to manual methods, StreamStats reduces the average time needed to estimate streamflow statistics for ungaged sites from several hours to several minutes.

  15. Modeling Ka-band low elevation angle propagation statistics

    NASA Technical Reports Server (NTRS)

    Russell, Thomas A.; Weinfield, John; Pearson, Chris; Ippolito, Louis J.

    1995-01-01

    The statistical variability of the secondary atmospheric propagation effects on satellite communications cannot be ignored at frequencies of 20 GHz or higher, particularly if the propagation margin allocation is such that link availability falls below 99 percent. The secondary effects considered in this paper are gaseous absorption, cloud absorption, and tropospheric scintillation; rain attenuation is the primary effect. Techniques and example results are presented for estimation of the overall combined impact of the atmosphere on satellite communications reliability. Statistical methods are employed throughout and the most widely accepted models for the individual effects are used wherever possible. The degree of correlation between the effects is addressed and some bounds on the expected variability in the combined effects statistics are derived from the expected variability in correlation. Example estimates are presented of combined effects statistics in the Washington D.C. area of 20 GHz and 5 deg elevation angle. The statistics of water vapor are shown to be sufficient for estimation of the statistics of gaseous absorption at 20 GHz. A computer model based on monthly surface weather is described and tested. Significant improvement in prediction of absorption extremes is demonstrated with the use of path weather data instead of surface data.

  16. Estimation of seismic quality factor: Artificial neural networks and current approaches

    NASA Astrophysics Data System (ADS)

    Yıldırım, Eray; Saatçılar, Ruhi; Ergintav, Semih

    2017-01-01

    The aims of this study are to estimate soil attenuation using alternatives to traditional methods, to compare results of using these methods, and to examine soil properties using the estimated results. The performances of all methods, amplitude decay, spectral ratio, Wiener filter, and artificial neural network (ANN) methods, are examined on field and synthetic data with noise and without noise. High-resolution seismic reflection field data from Yeniköy (Arnavutköy, İstanbul) was used as field data, and 424 estimations of Q values were made for each method (1,696 total). While statistical tests on synthetic and field data are quite close to the Q value estimation results of ANN, Wiener filter, and spectral ratio methods, the amplitude decay methods showed a higher estimation error. According to previous geological and geophysical studies in this area, the soil is water-saturated, quite weak, consisting of clay and sandy units, and, because of current and past landslides in the study area and its vicinity, researchers reported heterogeneity in the soil. Under the same physical conditions, Q value calculated on field data can be expected to be 7.9 and 13.6. ANN models with various structures, training algorithm, input, and number of neurons are investigated. A total of 480 ANN models were generated consisting of 60 models for noise-free synthetic data, 360 models for different noise content synthetic data and 60 models to apply to the data collected in the field. The models were tested to determine the most appropriate structure and training algorithm. In the final ANN, the input vectors consisted of the difference of the width, energy, and distance of seismic traces, and the output was Q value. Success rate of both ANN methods with noise-free and noisy synthetic data were higher than the other three methods. Also according to the statistical tests on estimated Q value from field data, the method showed results that are more suitable. The Q value can be estimated practically and quickly by processing the traces with the recommended ANN model. Consequently, the ANN method could be used for estimating Q value from seismic data.

  17. The Hurst Phenomenon in Error Estimates Related to Atmospheric Turbulence

    NASA Astrophysics Data System (ADS)

    Dias, Nelson Luís; Crivellaro, Bianca Luhm; Chamecki, Marcelo

    2018-05-01

    The Hurst phenomenon is a well-known feature of long-range persistence first observed in hydrological and geophysical time series by E. Hurst in the 1950s. It has also been found in several cases in turbulence time series measured in the wind tunnel, the atmosphere, and in rivers. Here, we conduct a systematic investigation of the value of the Hurst coefficient H in atmospheric surface-layer data, and its impact on the estimation of random errors. We show that usually H > 0.5 , which implies the non-existence (in the statistical sense) of the integral time scale. Since the integral time scale is present in the Lumley-Panofsky equation for the estimation of random errors, this has important practical consequences. We estimated H in two principal ways: (1) with an extension of the recently proposed filtering method to estimate the random error (H_p ), and (2) with the classical rescaled range introduced by Hurst (H_R ). Other estimators were tried but were found less able to capture the statistical behaviour of the large scales of turbulence. Using data from three micrometeorological campaigns we found that both first- and second-order turbulence statistics display the Hurst phenomenon. Usually, H_R is larger than H_p for the same dataset, raising the question that one, or even both, of these estimators, may be biased. For the relative error, we found that the errors estimated with the approach adopted by us, that we call the relaxed filtering method, and that takes into account the occurrence of the Hurst phenomenon, are larger than both the filtering method and the classical Lumley-Panofsky estimates. Finally, we found that there is no apparent relationship between H and the Obukhov stability parameter. The relative errors, however, do show stability dependence, particularly in the case of the error of the kinematic momentum flux in unstable conditions, and that of the kinematic sensible heat flux in stable conditions.

  18. The applications of statistical quantification techniques in nanomechanics and nanoelectronics.

    PubMed

    Mai, Wenjie; Deng, Xinwei

    2010-10-08

    Although nanoscience and nanotechnology have been developing for approximately two decades and have achieved numerous breakthroughs, the experimental results from nanomaterials with a higher noise level and poorer repeatability than those from bulk materials still remain as a practical issue, and challenge many techniques of quantification of nanomaterials. This work proposes a physical-statistical modeling approach and a global fitting statistical method to use all the available discrete data or quasi-continuous curves to quantify a few targeted physical parameters, which can provide more accurate, efficient and reliable parameter estimates, and give reasonable physical explanations. In the resonance method for measuring the elastic modulus of ZnO nanowires (Zhou et al 2006 Solid State Commun. 139 222-6), our statistical technique gives E = 128.33 GPa instead of the original E = 108 GPa, and unveils a negative bias adjustment f(0). The causes are suggested by the systematic bias in measuring the length of the nanowires. In the electronic measurement of the resistivity of a Mo nanowire (Zach et al 2000 Science 290 2120-3), the proposed new method automatically identified the importance of accounting for the Ohmic contact resistance in the model of the Ohmic behavior in nanoelectronics experiments. The 95% confidence interval of resistivity in the proposed one-step procedure is determined to be 3.57 +/- 0.0274 x 10( - 5) ohm cm, which should be a more reliable and precise estimate. The statistical quantification technique should find wide applications in obtaining better estimations from various systematic errors and biased effects that become more significant at the nanoscale.

  19. Use of personalized Dynamic Treatment Regimes (DTRs) and Sequential Multiple Assignment Randomized Trials (SMARTs) in mental health studies

    PubMed Central

    Liu, Ying; ZENG, Donglin; WANG, Yuanjia

    2014-01-01

    Summary Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each point where a clinical decision is made based on each patient’s time-varying characteristics and intermediate outcomes observed at earlier points in time. The complexity, patient heterogeneity, and chronicity of mental disorders call for learning optimal DTRs to dynamically adapt treatment to an individual’s response over time. The Sequential Multiple Assignment Randomized Trial (SMARTs) design allows for estimating causal effects of DTRs. Modern statistical tools have been developed to optimize DTRs based on personalized variables and intermediate outcomes using rich data collected from SMARTs; these statistical methods can also be used to recommend tailoring variables for designing future SMART studies. This paper introduces DTRs and SMARTs using two examples in mental health studies, discusses two machine learning methods for estimating optimal DTR from SMARTs data, and demonstrates the performance of the statistical methods using simulated data. PMID:25642116

  20. Estimators of The Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty

    PubMed Central

    Lu, Yang; Loizou, Philipos C.

    2011-01-01

    Statistical estimators of the magnitude-squared spectrum are derived based on the assumption that the magnitude-squared spectrum of the noisy speech signal can be computed as the sum of the (clean) signal and noise magnitude-squared spectra. Maximum a posterior (MAP) and minimum mean square error (MMSE) estimators are derived based on a Gaussian statistical model. The gain function of the MAP estimator was found to be identical to the gain function used in the ideal binary mask (IdBM) that is widely used in computational auditory scene analysis (CASA). As such, it was binary and assumed the value of 1 if the local SNR exceeded 0 dB, and assumed the value of 0 otherwise. By modeling the local instantaneous SNR as an F-distributed random variable, soft masking methods were derived incorporating SNR uncertainty. The soft masking method, in particular, which weighted the noisy magnitude-squared spectrum by the a priori probability that the local SNR exceeds 0 dB was shown to be identical to the Wiener gain function. Results indicated that the proposed estimators yielded significantly better speech quality than the conventional MMSE spectral power estimators, in terms of yielding lower residual noise and lower speech distortion. PMID:21886543

  1. Evaluation of methods to estimate lake herring spawner abundance in Lake Superior

    USGS Publications Warehouse

    Yule, D.L.; Stockwell, J.D.; Cholwek, G.A.; Evrard, L.M.; Schram, S.; Seider, M.; Symbal, M.

    2006-01-01

    Historically, commercial fishers harvested Lake Superior lake herring Coregonus artedi for their flesh, but recently operators have targeted lake herring for roe. Because no surveys have estimated spawning female abundance, direct estimates of fishing mortality are lacking. The primary objective of this study was to determine the feasibility of using acoustic techniques in combination with midwater trawling to estimate spawning female lake herring densities in a Lake Superior statistical grid (i.e., a 10′ latitude × 10′ longitude area over which annual commercial harvest statistics are compiled). Midwater trawling showed that mature female lake herring were largely pelagic during the night in late November, accounting for 94.5% of all fish caught exceeding 250 mm total length. When calculating acoustic estimates of mature female lake herring, we excluded backscattering from smaller pelagic fishes like immature lake herring and rainbow smelt Osmerus mordax by applying an empirically derived threshold of −35.6 dB. We estimated the average density of mature females in statistical grid 1409 at 13.3 fish/ha and the total number of spawning females at 227,600 (95% confidence interval = 172,500–282,700). Using information on mature female densities, size structure, and fecundity, we estimate that females deposited 3.027 billion (109) eggs in grid 1409 (95% confidence interval = 2.356–3.778 billion). The relative estimation error of the mature female density estimate derived using a geostatistical model—based approach was low (12.3%), suggesting that the employed method was robust. Fishing mortality rates of all mature females and their eggs were estimated at 2.3% and 3.8%, respectively. The techniques described for enumerating spawning female lake herring could be used to develop a more accurate stock–recruitment model for Lake Superior lake herring.

  2. A method for estimating the performance of photovoltaic systems

    NASA Astrophysics Data System (ADS)

    Clark, D. R.; Klein, S. A.; Beckman, W. A.

    A method is presented for predicting the long-term average performance of photovoltaic systems having storage batteries and subject to any diurnal load profile. The monthly-average fraction of the load met by the system is estimated from array parameters and monthly-average meteorological data. The method is based on radiation statistics, and utilizability, and can account for variability in the electrical demand as well as for the variability in solar radiation.

  3. Biologically plausible particulate air pollution mortality concentration-response functions.

    PubMed Central

    Roberts, Steven

    2004-01-01

    In this article I introduce an alternative method for estimating particulate air pollution mortality concentration-response functions. This method constrains the particulate air pollution mortality concentration-response function to be biologically plausible--that is, a non-decreasing function of the particulate air pollution concentration. Using time-series data from Cook County, Illinois, the proposed method yields more meaningful particulate air pollution mortality concentration-response function estimates with an increase in statistical accuracy. PMID:14998745

  4. Statistical Estimation of Heterogeneities: A New Frontier in Well Testing

    NASA Astrophysics Data System (ADS)

    Neuman, S. P.; Guadagnini, A.; Illman, W. A.; Riva, M.; Vesselinov, V. V.

    2001-12-01

    Well-testing methods have traditionally relied on analytical solutions of groundwater flow equations in relatively simple domains, consisting of one or at most a few units having uniform hydraulic properties. Recently, attention has been shifting toward methods and solutions that would allow one to characterize subsurface heterogeneities in greater detail. On one hand, geostatistical inverse methods are being used to assess the spatial variability of parameters, such as permeability and porosity, on the basis of multiple cross-hole pressure interference tests. On the other hand, analytical solutions are being developed to describe the mean and variance (first and second statistical moments) of flow to a well in a randomly heterogeneous medium. Geostatistical inverse interpretation of cross-hole tests yields a smoothed but detailed "tomographic" image of how parameters actually vary in three-dimensional space, together with corresponding measures of estimation uncertainty. Moment solutions may soon allow one to interpret well tests in terms of statistical parameters such as the mean and variance of log permeability, its spatial autocorrelation and statistical anisotropy. The idea of geostatistical cross-hole tomography is illustrated through pneumatic injection tests conducted in unsaturated fractured tuff at the Apache Leap Research Site near Superior, Arizona. The idea of using moment equations to interpret well-tests statistically is illustrated through a recently developed three-dimensional solution for steady state flow to a well in a bounded, randomly heterogeneous, statistically anisotropic aquifer.

  5. An analytic technique for statistically modeling random atomic clock errors in estimation

    NASA Technical Reports Server (NTRS)

    Fell, P. J.

    1981-01-01

    Minimum variance estimation requires that the statistics of random observation errors be modeled properly. If measurements are derived through the use of atomic frequency standards, then one source of error affecting the observable is random fluctuation in frequency. This is the case, for example, with range and integrated Doppler measurements from satellites of the Global Positioning and baseline determination for geodynamic applications. An analytic method is presented which approximates the statistics of this random process. The procedure starts with a model of the Allan variance for a particular oscillator and develops the statistics of range and integrated Doppler measurements. A series of five first order Markov processes is used to approximate the power spectral density obtained from the Allan variance.

  6. Efficient statistical tests to compare Youden index: accounting for contingency correlation.

    PubMed

    Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

    2015-04-30

    Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.

  7. Contrasting academic and tobacco industry estimates of illicit cigarette trade: evidence from Warsaw, Poland.

    PubMed

    Stoklosa, Michal; Ross, Hana

    2014-05-01

    To compare two different methods for estimating the size of the illicit cigarette market with each other and to contrast the estimates obtained by these two methods with the results of an industry-commissioned study. We used two observational methods: collection of data from packs in smokers' personal possession, and collection of data from packs discarded on streets. The data were obtained in Warsaw, Poland in September 2011 and October 2011. We used tests of independence to compare the results based on the two methods, and to contrast those with the estimate from the industry-commissioned discarded pack collection conducted in September 2011. We found that the proportions of cigarette packs classified as not intended for the Polish market estimated by our two methods were not statistically different. These estimates were 14.6% (95% CI 10.8% to 19.4%) using the survey data (N=400) and 15.6% (95% CI 13.2% to 18.4%) using the discarded pack data (N=754). The industry estimate (22.9%) was higher by nearly a half compared with our estimates, and this difference is statistically significant. Our findings are consistent with previous evidence of the tobacco industry exaggerating the scope of illicit trade and with the general pattern of the industry manipulating evidence to mislead the debate on tobacco control policy in many countries. Collaboration between governments and the tobacco industry to estimate tobacco tax avoidance and evasion is likely to produce upward-biased estimates of illicit cigarette trade. If governments are presented with industry estimates, they should strictly require a disclosure of all methodological details and data used in generating these estimates, and should seek advice from independent experts. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. Combining Neural Networks with Existing Methods to Estimate 1 in 100-Year Flood Event Magnitudes

    NASA Astrophysics Data System (ADS)

    Newson, A.; See, L.

    2005-12-01

    Over the last fifteen years artificial neural networks (ANN) have been shown to be advantageous for the solution of many hydrological modelling problems. The use of ANNs for flood magnitude estimation in ungauged catchments, however, is a relatively new and under researched area. In this paper ANNs are used to make estimates of the magnitude of the 100-year flood event (Q100) for a number of ungauged catchments. The data used in this study were provided by the Centre for Ecology and Hydrology's Flood Estimation Handbook (FEH), which contains information on catchments across the UK. Sixteen catchment descriptors for 719 catchments were used to train an ANN, which was split into a training, validation and test data set. The goodness-of-fit statistics on the test data set indicated good model performance, with an r-squared value of 0.8 and a coefficient of efficiency of 79 percent. Data for twelve ungauged catchments were then put through the trained ANN to produce estimates of Q100. Two other accepted methodologies were also employed: the FEH statistical method and the FSR (Flood Studies Report) design storm technique, both of which are used to produce flood frequency estimates. The advantage of developing an ANN model is that it provides a third figure to aid a hydrologist in making an accurate estimate. For six of the twelve catchments, there was a relatively low spread between estimates. In these instances, an estimate of Q100 could be made with a fair degree of certainty. Of the remaining six catchments, three had areas greater than 1000km2, which means the FSR design storm estimate cannot be used. Armed with the ANN model and the FEH statistical method the hydrologist still has two possible estimates to consider. For these three catchments, the estimates were also fairly similar, providing additional confidence to the estimation. In summary, the findings of this study have shown that an accurate estimation of Q100 can be made using the catchment descriptors of an ungauged catchment as inputs to an ANN. It also demonstrated how the ANN Q100 estimates can be used in conjunction with a number of other estimates in order to provide a more accurate and confident estimate of Q100 at an ungauged catchment. This clearly exploits the strengths of existing methods in combination with the latest soft computing tools.

  9. Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems

    NASA Technical Reports Server (NTRS)

    He, Yuning; Davies, Misty Dawn

    2014-01-01

    The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.

  10. Adaptation of a Fast Optimal Interpolation Algorithm to the Mapping of Oceangraphic Data

    NASA Technical Reports Server (NTRS)

    Menemenlis, Dimitris; Fieguth, Paul; Wunsch, Carl; Willsky, Alan

    1997-01-01

    A fast, recently developed, multiscale optimal interpolation algorithm has been adapted to the mapping of hydrographic and other oceanographic data. This algorithm produces solution and error estimates which are consistent with those obtained from exact least squares methods, but at a small fraction of the computational cost. Problems whose solution would be completely impractical using exact least squares, that is, problems with tens or hundreds of thousands of measurements and estimation grid points, can easily be solved on a small workstation using the multiscale algorithm. In contrast to methods previously proposed for solving large least squares problems, our approach provides estimation error statistics while permitting long-range correlations, using all measurements, and permitting arbitrary measurement locations. The multiscale algorithm itself, published elsewhere, is not the focus of this paper. However, the algorithm requires statistical models having a very particular multiscale structure; it is the development of a class of multiscale statistical models, appropriate for oceanographic mapping problems, with which we concern ourselves in this paper. The approach is illustrated by mapping temperature in the northeastern Pacific. The number of hydrographic stations is kept deliberately small to show that multiscale and exact least squares results are comparable. A portion of the data were not used in the analysis; these data serve to test the multiscale estimates. A major advantage of the present approach is the ability to repeat the estimation procedure a large number of times for sensitivity studies, parameter estimation, and model testing. We have made available by anonymous Ftp a set of MATLAB-callable routines which implement the multiscale algorithm and the statistical models developed in this paper.

  11. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    PubMed Central

    Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.

    2017-01-01

    Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237

  12. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  13. A comparison of Probability Of Detection (POD) data determined using different statistical methods

    NASA Astrophysics Data System (ADS)

    Fahr, A.; Forsyth, D.; Bullock, M.

    1993-12-01

    Different statistical methods have been suggested for determining probability of detection (POD) data for nondestructive inspection (NDI) techniques. A comparative assessment of various methods of determining POD was conducted using results of three NDI methods obtained by inspecting actual aircraft engine compressor disks which contained service induced cracks. The study found that the POD and 95 percent confidence curves as a function of crack size as well as the 90/95 percent crack length vary depending on the statistical method used and the type of data. The distribution function as well as the parameter estimation procedure used for determining POD and the confidence bound must be included when referencing information such as the 90/95 percent crack length. The POD curves and confidence bounds determined using the range interval method are very dependent on information that is not from the inspection data. The maximum likelihood estimators (MLE) method does not require such information and the POD results are more reasonable. The log-logistic function appears to model POD of hit/miss data relatively well and is easy to implement. The log-normal distribution using MLE provides more realistic POD results and is the preferred method. Although it is more complicated and slower to calculate, it can be implemented on a common spreadsheet program.

  14. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach.

    PubMed

    Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D

    2015-02-20

    The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. Copyright © 2014 John Wiley & Sons, Ltd.

  15. Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records.

    PubMed

    Landy, Rebecca; Cheung, Li C; Schiffman, Mark; Gage, Julia C; Hyun, Noorie; Wentzensen, Nicolas; Kinney, Walter K; Castle, Philip E; Fetterman, Barbara; Poitras, Nancy E; Lorey, Thomas; Sasieni, Peter D; Katki, Hormuzd A

    2018-06-01

    Electronic health-records (EHR) are increasingly used by epidemiologists studying disease following surveillance testing to provide evidence for screening intervals and referral guidelines. Although cost-effective, undiagnosed prevalent disease and interval censoring (in which asymptomatic disease is only observed at the time of testing) raise substantial analytic issues when estimating risk that cannot be addressed using Kaplan-Meier methods. Based on our experience analysing EHR from cervical cancer screening, we previously proposed the logistic-Weibull model to address these issues. Here we demonstrate how the choice of statistical method can impact risk estimates. We use observed data on 41,067 women in the cervical cancer screening program at Kaiser Permanente Northern California, 2003-2013, as well as simulations to evaluate the ability of different methods (Kaplan-Meier, Turnbull, Weibull and logistic-Weibull) to accurately estimate risk within a screening program. Cumulative risk estimates from the statistical methods varied considerably, with the largest differences occurring for prevalent disease risk when baseline disease ascertainment was random but incomplete. Kaplan-Meier underestimated risk at earlier times and overestimated risk at later times in the presence of interval censoring or undiagnosed prevalent disease. Turnbull performed well, though was inefficient and not smooth. The logistic-Weibull model performed well, except when event times didn't follow a Weibull distribution. We have demonstrated that methods for right-censored data, such as Kaplan-Meier, result in biased estimates of disease risks when applied to interval-censored data, such as screening programs using EHR data. The logistic-Weibull model is attractive, but the model fit must be checked against Turnbull non-parametric risk estimates. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Generalized Hurst exponent estimates differentiate EEG signals of healthy and epileptic patients

    NASA Astrophysics Data System (ADS)

    Lahmiri, Salim

    2018-01-01

    The aim of our current study is to check whether multifractal patterns of the electroencephalographic (EEG) signals of normal and epileptic patients are statistically similar or different. In this regard, the generalized Hurst exponent (GHE) method is used for robust estimation of the multifractals in each type of EEG signals, and three powerful statistical tests are performed to check existence of differences between estimated GHEs from healthy control subjects and epileptic patients. The obtained results show that multifractals exist in both types of EEG signals. Particularly, it was found that the degree of fractal is more pronounced in short variations of normal EEG signals than in short variations of EEG signals with seizure free intervals. In contrary, it is more pronounced in long variations of EEG signals with seizure free intervals than in normal EEG signals. Importantly, both parametric and nonparametric statistical tests show strong evidence that estimated GHEs of normal EEG signals are statistically and significantly different from those with seizure free intervals. Therefore, GHEs can be efficiently used to distinguish between healthy and patients suffering from epilepsy.

  17. Impact of joint statistical dual-energy CT reconstruction of proton stopping power images: Comparison to image- and sinogram-domain material decomposition approaches.

    PubMed

    Zhang, Shuangyue; Han, Dong; Politte, David G; Williamson, Jeffrey F; O'Sullivan, Joseph A

    2018-05-01

    The purpose of this study was to assess the performance of a novel dual-energy CT (DECT) approach for proton stopping power ratio (SPR) mapping that integrates image reconstruction and material characterization using a joint statistical image reconstruction (JSIR) method based on a linear basis vector model (BVM). A systematic comparison between the JSIR-BVM method and previously described DECT image- and sinogram-domain decomposition approaches is also carried out on synthetic data. The JSIR-BVM method was implemented to estimate the electron densities and mean excitation energies (I-values) required by the Bethe equation for SPR mapping. In addition, image- and sinogram-domain DECT methods based on three available SPR models including BVM were implemented for comparison. The intrinsic SPR modeling accuracy of the three models was first validated. Synthetic DECT transmission sinograms of two 330 mm diameter phantoms each containing 17 soft and bony tissues (for a total of 34) of known composition were then generated with spectra of 90 and 140 kVp. The estimation accuracy of the reconstructed SPR images were evaluated for the seven investigated methods. The impact of phantom size and insert location on SPR estimation accuracy was also investigated. All three selected DECT-SPR models predict the SPR of all tissue types with less than 0.2% RMS errors under idealized conditions with no reconstruction uncertainties. When applied to synthetic sinograms, the JSIR-BVM method achieves the best performance with mean and RMS-average errors of less than 0.05% and 0.3%, respectively, for all noise levels, while the image- and sinogram-domain decomposition methods show increasing mean and RMS-average errors with increasing noise level. The JSIR-BVM method also reduces statistical SPR variation by sixfold compared to other methods. A 25% phantom diameter change causes up to 4% SPR differences for the image-domain decomposition approach, while the JSIR-BVM method and sinogram-domain decomposition methods are insensitive to size change. Among all the investigated methods, the JSIR-BVM method achieves the best performance for SPR estimation in our simulation phantom study. This novel method is robust with respect to sinogram noise and residual beam-hardening effects, yielding SPR estimation errors comparable to intrinsic BVM modeling error. In contrast, the achievable SPR estimation accuracy of the image- and sinogram-domain decomposition methods is dominated by the CT image intensity uncertainties introduced by the reconstruction and decomposition processes. © 2018 American Association of Physicists in Medicine.

  18. Estimating trends in the global mean temperature record

    NASA Astrophysics Data System (ADS)

    Poppick, Andrew; Moyer, Elisabeth J.; Stein, Michael L.

    2017-06-01

    Given uncertainties in physical theory and numerical climate simulations, the historical temperature record is often used as a source of empirical information about climate change. Many historical trend analyses appear to de-emphasize physical and statistical assumptions: examples include regression models that treat time rather than radiative forcing as the relevant covariate, and time series methods that account for internal variability in nonparametric rather than parametric ways. However, given a limited data record and the presence of internal variability, estimating radiatively forced temperature trends in the historical record necessarily requires some assumptions. Ostensibly empirical methods can also involve an inherent conflict in assumptions: they require data records that are short enough for naive trend models to be applicable, but long enough for long-timescale internal variability to be accounted for. In the context of global mean temperatures, empirical methods that appear to de-emphasize assumptions can therefore produce misleading inferences, because the trend over the twentieth century is complex and the scale of temporal correlation is long relative to the length of the data record. We illustrate here how a simple but physically motivated trend model can provide better-fitting and more broadly applicable trend estimates and can allow for a wider array of questions to be addressed. In particular, the model allows one to distinguish, within a single statistical framework, between uncertainties in the shorter-term vs. longer-term response to radiative forcing, with implications not only on historical trends but also on uncertainties in future projections. We also investigate the consequence on inferred uncertainties of the choice of a statistical description of internal variability. While nonparametric methods may seem to avoid making explicit assumptions, we demonstrate how even misspecified parametric statistical methods, if attuned to the important characteristics of internal variability, can result in more accurate uncertainty statements about trends.

  19. Cost estimating methods for advanced space systems

    NASA Technical Reports Server (NTRS)

    Cyr, Kelley

    1988-01-01

    Parametric cost estimating methods for space systems in the conceptual design phase are developed. The approach is to identify variables that drive cost such as weight, quantity, development culture, design inheritance, and time. The relationship between weight and cost is examined in detail. A theoretical model of cost is developed and tested statistically against a historical data base of major research and development programs. It is concluded that the technique presented is sound, but that it must be refined in order to produce acceptable cost estimates.

  20. A Fresh Start for Flood Estimation in Ungauged Basins

    NASA Astrophysics Data System (ADS)

    Woods, R. A.

    2017-12-01

    The two standard methods for flood estimation in ungauged basins, regression-based statistical models and rainfall-runoff models using a design rainfall event, have survived relatively unchanged as the methods of choice for more than 40 years. Their technical implementation has developed greatly, but the models' representation of hydrological processes has not, despite a large volume of hydrological research. I suggest it is time to introduce more hydrology into flood estimation. The reliability of the current methods can be unsatisfactory. For example, despite the UK's relatively straightforward hydrology, regression estimates of the index flood are uncertain by +/- a factor of two (for a 95% confidence interval), an impractically large uncertainty for design. The standard error of rainfall-runoff model estimates is not usually known, but available assessments indicate poorer reliability than statistical methods. There is a practical need for improved reliability in flood estimation. Two promising candidates to supersede the existing methods are (i) continuous simulation by rainfall-runoff modelling and (ii) event-based derived distribution methods. The main challenge with continuous simulation methods in ungauged basins is to specify the model structure and parameter values, when calibration data are not available. This has been an active area of research for more than a decade, and this activity is likely to continue. The major challenges for the derived distribution method in ungauged catchments include not only the correct specification of model structure and parameter values, but also antecedent conditions (e.g. seasonal soil water balance). However, a much smaller community of researchers are active in developing or applying the derived distribution approach, and as a result slower progress is being made. A change in needed: surely we have learned enough about hydrology in the last 40 years that we can make a practical hydrological advance on our methods for flood estimation! A shift to new methods for flood estimation will not be taken lightly by practitioners. However, the standard for change is clear - can we develop new methods which give significant improvements in reliability over those existing methods which are demonstrably unsatisfactory?

  1. Quantile Regression for Recurrent Gap Time Data

    PubMed Central

    Luo, Xianghua; Huang, Chiung-Yu; Wang, Lan

    2014-01-01

    Summary Evaluating covariate effects on gap times between successive recurrent events is of interest in many medical and public health studies. While most existing methods for recurrent gap time analysis focus on modeling the hazard function of gap times, a direct interpretation of the covariate effects on the gap times is not available through these methods. In this article, we consider quantile regression that can provide direct assessment of covariate effects on the quantiles of the gap time distribution. Following the spirit of the weighted risk-set method by Luo and Huang (2011, Statistics in Medicine 30, 301–311), we extend the martingale-based estimating equation method considered by Peng and Huang (2008, Journal of the American Statistical Association 103, 637–649) for univariate survival data to analyze recurrent gap time data. The proposed estimation procedure can be easily implemented in existing software for univariate censored quantile regression. Uniform consistency and weak convergence of the proposed estimators are established. Monte Carlo studies demonstrate the effectiveness of the proposed method. An application to data from the Danish Psychiatric Central Register is presented to illustrate the methods developed in this article. PMID:23489055

  2. Combining statistical inference and decisions in ecology

    USGS Publications Warehouse

    Williams, Perry J.; Hooten, Mevin B.

    2016-01-01

    Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.

  3. Statistics for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater runoff best management practices (BMPs)

    USGS Publications Warehouse

    Granato, Gregory E.

    2014-01-01

    The U.S. Geological Survey (USGS) developed the Stochastic Empirical Loading and Dilution Model (SELDM) in cooperation with the Federal Highway Administration (FHWA) to indicate the risk for stormwater concentrations, flows, and loads to be above user-selected water-quality goals and the potential effectiveness of mitigation measures to reduce such risks. SELDM models the potential effect of mitigation measures by using Monte Carlo methods with statistics that approximate the net effects of structural and nonstructural best management practices (BMPs). In this report, structural BMPs are defined as the components of the drainage pathway between the source of runoff and a stormwater discharge location that affect the volume, timing, or quality of runoff. SELDM uses a simple stochastic statistical model of BMP performance to develop planning-level estimates of runoff-event characteristics. This statistical approach can be used to represent a single BMP or an assemblage of BMPs. The SELDM BMP-treatment module has provisions for stochastic modeling of three stormwater treatments: volume reduction, hydrograph extension, and water-quality treatment. In SELDM, these three treatment variables are modeled by using the trapezoidal distribution and the rank correlation with the associated highway-runoff variables. This report describes methods for calculating the trapezoidal-distribution statistics and rank correlation coefficients for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater BMPs and provides the calculated values for these variables. This report also provides robust methods for estimating the minimum irreducible concentration (MIC), which is the lowest expected effluent concentration from a particular BMP site or a class of BMPs. These statistics are different from the statistics commonly used to characterize or compare BMPs. They are designed to provide a stochastic transfer function to approximate the quantity, duration, and quality of BMP effluent given the associated inflow values for a population of storm events. A database application and several spreadsheet tools are included in the digital media accompanying this report for further documentation of methods and for future use. In this study, analyses were done with data extracted from a modified copy of the January 2012 version of International Stormwater Best Management Practices Database, designated herein as the January 2012a version. Statistics for volume reduction, hydrograph extension, and water-quality treatment were developed with selected data. Sufficient data were available to estimate statistics for 5 to 10 BMP categories by using data from 40 to more than 165 monitoring sites. Water-quality treatment statistics were developed for 13 runoff-quality constituents commonly measured in highway and urban runoff studies including turbidity, sediment and solids; nutrients; total metals; organic carbon; and fecal coliforms. The medians of the best-fit statistics for each category were selected to construct generalized cumulative distribution functions for the three treatment variables. For volume reduction and hydrograph extension, interpretation of available data indicates that selection of a Spearman’s rho value that is the average of the median and maximum values for the BMP category may help generate realistic simulation results in SELDM. The median rho value may be selected to help generate realistic simulation results for water-quality treatment variables. MIC statistics were developed for 12 runoff-quality constituents commonly measured in highway and urban runoff studies by using data from 11 BMP categories and more than 167 monitoring sites. Four statistical techniques were applied for estimating MIC values with monitoring data from each site. These techniques produce a range of lower-bound estimates for each site. Four MIC estimators are proposed as alternatives for selecting a value from among the estimates from multiple sites. Correlation analysis indicates that the MIC estimates from multiple sites were weakly correlated with the geometric mean of inflow values, which indicates that there may be a qualitative or semiquantitative link between the inflow quality and the MIC. Correlations probably are weak because the MIC is influenced by the inflow water quality and the capability of each individual BMP site to reduce inflow concentrations.

  4. Fitting a three-parameter lognormal distribution with applications to hydrogeochemical data from the National Uranium Resource Evaluation Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kane, V.E.

    1979-10-01

    The standard maximum likelihood and moment estimation procedures are shown to have some undesirable characteristics for estimating the parameters in a three-parameter lognormal distribution. A class of goodness-of-fit estimators is found which provides a useful alternative to the standard methods. The class of goodness-of-fit tests considered include the Shapiro-Wilk and Shapiro-Francia tests which reduce to a weighted linear combination of the order statistics that can be maximized in estimation problems. The weighted-order statistic estimators are compared to the standard procedures in Monte Carlo simulations. Bias and robustness of the procedures are examined and example data sets analyzed including geochemical datamore » from the National Uranium Resource Evaluation Program.« less

  5. Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.

    1981-01-01

    As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.

  6. Testing variance components by two jackknife methods

    USDA-ARS?s Scientific Manuscript database

    The jacknife method, a resampling technique, has been widely used for statistical tests for years. The pseudo value based jacknife method (defined as pseudo jackknife method) is commonly used to reduce the bias for an estimate; however, sometimes it could result in large variaion for an estmimate a...

  7. Primer on statistical interpretation or methods report card on propensity-score matching in the cardiology literature from 2004 to 2006: a systematic review.

    PubMed

    Austin, Peter C

    2008-09-01

    Propensity-score matching is frequently used in the cardiology literature. Recent systematic reviews have found that this method is, in general, poorly implemented in the medical literature. The study objective was to examine the quality of the implementation of propensity-score matching in the general cardiology literature. A total of 44 articles published in the American Heart Journal, the American Journal of Cardiology, Circulation, the European Heart Journal, Heart, the International Journal of Cardiology, and the Journal of the American College of Cardiology between January 1, 2004, and December 31, 2006, were examined. Twenty of the 44 studies did not provide adequate information on how the propensity-score-matched pairs were formed. Fourteen studies did not report whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. Only 4 studies explicitly used statistical methods appropriate for matched studies to compare baseline characteristics between treated and untreated subjects. Only 11 (25%) of the 44 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Only 2 studies described the matching method used, assessed balance in baseline covariates by appropriate methods, and used appropriate statistical methods to estimate the treatment effect and its significance. Application of propensity-score matching was poor in the cardiology literature. Suggestions for improving the reporting and analysis of studies that use propensity-score matching are provided.

  8. Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.

    PubMed

    Kim, Yuneung; Lim, Johan; Park, DoHwan

    2015-11-01

    In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Data Analysis Techniques for Physical Scientists

    NASA Astrophysics Data System (ADS)

    Pruneau, Claude A.

    2017-10-01

    Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.

  11. A Comparison of Two Methods for Initiating Air Mass Back Trajectories

    NASA Astrophysics Data System (ADS)

    Putman, A.; Posmentier, E. S.; Faiia, A. M.; Sonder, L. J.; Feng, X.

    2014-12-01

    Lagrangian air mass tracking programs in back cast mode are a powerful tool for estimating the water vapor source of precipitation events. The altitudes above the precipitation site where particle's back trajectories begin influences the source estimation. We assume that precipitation comes from water vapor in condensing regions of the air column, so particles are placed in proportion to an estimated condensation profile. We compare two methods for estimating where condensation occurs and the resulting evaporation sites for 63 events at Barrow, AK. The first method (M1) uses measurements from a 35 GHz vertically resolved cloud radar (MMCR), and algorithms developed by Zhao and Garrett1 to calculate precipitation rate. The second method (M2) uses the Global Data Assimilation System reanalysis data in a lofting model. We assess how accurately M2, developed for global coverage, will perform in absence of direct cloud observations. Results from the two methods are statistically similar. The mean particle height estimated by M2 is, on average, 695 m (s.d. = 1800 m) higher than M1. The corresponding average vapor source estimated by M2 is 1.5⁰ (s.d. = 5.4⁰) south of M1. In addition, vapor sources for M2 relative to M1 have ocean surface temperatures averaging 1.1⁰C (s.d. = 3.5⁰C) warmer, and reported ocean surface relative humidities 0.31% (s.d. = 6.1%) drier. All biases except the latter are statistically significant (p = 0.02 for each). Results were skewed by events where M2 estimated very high altitudes of condensation. When M2 produced an average particle height less than 5000 m (89% of events), M2 estimated mean particle heights 76 m (s.d. = 741 m) higher than M1, corresponding to a vapor source 0.54⁰ (s.d. = 4.2⁰) south of M1. The ocean surface at the vapor source was an average of 0.35⁰C (s.d. = 2.35⁰C) warmer and ocean surface relative humidities were 0.02% (s.d. = 5.5%) wetter. None of the biases was statistically significant. If the vapor source meteorology estimated by M2 is used to determine vapor isotopic properties it would produce results similar to M1 in all cases except the occasional very high cloud. The methods strive to balance a sufficient number of tracked air masses for meaningful vapor source estimation with minimal computational time. Zhao, C and Garrett, T.J. 2008, J. Geophys. Res.

  12. A Review and Comparison of Methods for Recreating Individual Patient Data from Published Kaplan-Meier Survival Curves for Economic Evaluations: A Simulation Study

    PubMed Central

    Wan, Xiaomin; Peng, Liubao; Li, Yuanjian

    2015-01-01

    Background In general, the individual patient-level data (IPD) collected in clinical trials are not available to independent researchers to conduct economic evaluations; researchers only have access to published survival curves and summary statistics. Thus, methods that use published survival curves and summary statistics to reproduce statistics for economic evaluations are essential. Four methods have been identified: two traditional methods 1) least squares method, 2) graphical method; and two recently proposed methods by 3) Hoyle and Henley, 4) Guyot et al. The four methods were first individually reviewed and subsequently assessed regarding their abilities to estimate mean survival through a simulation study. Methods A number of different scenarios were developed that comprised combinations of various sample sizes, censoring rates and parametric survival distributions. One thousand simulated survival datasets were generated for each scenario, and all methods were applied to actual IPD. The uncertainty in the estimate of mean survival time was also captured. Results All methods provided accurate estimates of the mean survival time when the sample size was 500 and a Weibull distribution was used. When the sample size was 100 and the Weibull distribution was used, the Guyot et al. method was almost as accurate as the Hoyle and Henley method; however, more biases were identified in the traditional methods. When a lognormal distribution was used, the Guyot et al. method generated noticeably less bias and a more accurate uncertainty compared with the Hoyle and Henley method. Conclusions The traditional methods should not be preferred because of their remarkable overestimation. When the Weibull distribution was used for a fitted model, the Guyot et al. method was almost as accurate as the Hoyle and Henley method. However, if the lognormal distribution was used, the Guyot et al. method was less biased compared with the Hoyle and Henley method. PMID:25803659

  13. A Monte Carlo simulation to the performance of the R/S and V/S methods—Statistical revisit and real world application

    NASA Astrophysics Data System (ADS)

    He, Ling-Yun; Qian, Wen-Bin

    2012-07-01

    A correct or precise estimation of the Hurst exponent is one of the fundamentally important problems in the financial economics literature. There are three widely used tools to estimate the Hurst exponent, the canonical rescaled range (R/S), the variance rescaled statistic (V/S) and the Modified rescaled range (Modified R/S). To clarify their performance, we compare them by Monte Carlo simulations; we generate many time-series of a fractal Brownian motion, of a Weierstrass-Mandelbrot cosine fractal function and of a fractionally integrated process, whose theoretical Hurst exponents are known, to compare the Hurst exponents estimated by the three methods. To better understand their pragmatic performance, we further apply all of these methods empirically in real-world applications. Our results imply it is not appropriate to conclude simply which method is better as V/S performs better when the analyzed market is anti-persistent while R/S seems to be a reliable tool used in persistent market.

  14. Asymptotic Normality of the Maximum Pseudolikelihood Estimator for Fully Visible Boltzmann Machines.

    PubMed

    Nguyen, Hien D; Wood, Ian A

    2016-04-01

    Boltzmann machines (BMs) are a class of binary neural networks for which there have been numerous proposed methods of estimation. Recently, it has been shown that in the fully visible case of the BM, the method of maximum pseudolikelihood estimation (MPLE) results in parameter estimates, which are consistent in the probabilistic sense. In this brief, we investigate the properties of MPLE for the fully visible BMs further, and prove that MPLE also yields an asymptotically normal parameter estimator. These results can be used to construct confidence intervals and to test statistical hypotheses. These constructions provide a closed-form alternative to the current methods that require Monte Carlo simulation or resampling. We support our theoretical results by showing that the estimator behaves as expected in simulation studies.

  15. Statistical methods for astronomical data with upper limits. II - Correlation and regression

    NASA Technical Reports Server (NTRS)

    Isobe, T.; Feigelson, E. D.; Nelson, P. I.

    1986-01-01

    Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.

  16. Double-survey estimates of bald eagle populations in Oregon

    USGS Publications Warehouse

    Anthony, R.G.; Garrett, Monte G.; Isaacs, F.B.

    1999-01-01

    The literature on abundance of birds of prey is almost devoid of population estimates with statistical rigor. Therefore, we surveyed bald eagle (Haliaeetus leucocephalus) populations on the Crooked and lower Columbia rivers of Oregon and used the double-survey method to estimate populations and sighting probabilities for different survey methods (aerial, boat, vehicle) and bald eagle ages (adults vs. subadults). Sighting probabilities were consistently 20%. The results revealed variable and negative bias (percent relative bias = -9 to -70%) of direct counts and emphasized the importance of estimating populations where some measure of precision and ability to conduct inference tests are available. We recommend use of the double-survey method to estimate abundance of bald eagle populations and other raptors in open habitats.

  17. Methods for estimating private forest ownership statistics: revised methods for the USDA Forest Service's National Woodland Owner Survey

    Treesearch

    Brenton J. ​Dickinson; Brett J. Butler

    2013-01-01

    The USDA Forest Service's National Woodland Owner Survey (NWOS) is conducted to better understand the attitudes and behaviors of private forest ownerships, which control more than half of US forestland. Inferences about the populations of interest should be based on theoretically sound estimation procedures. A recent review of the procedures disclosed an error in...

  18. Properties of the endogenous post-stratified estimator using a random forests model

    Treesearch

    John Tipton; Jean Opsomer; Gretchen G. Moisen

    2012-01-01

    Post-stratification is used in survey statistics as a method to improve variance estimates. In traditional post-stratification methods, the variable on which the data is being stratified must be known at the population level. In many cases this is not possible, but it is possible to use a model to predict values using covariates, and then stratify on these predicted...

  19. Recurrence of attic cholesteatoma: different methods of estimating recurrence rates.

    PubMed

    Stangerup, S E; Drozdziewicz, D; Tos, M; Hougaard-Jensen, A

    2000-09-01

    One problem in cholesteatoma surgery is recurrence of cholesteatoma, which is reported to vary from 5% to 71%. This great variability can be explained by issues such as the type of cholesteatoma, surgical technique, follow-up rate, length of the postoperative observation period, and statistical method applied. The aim of this study was to illustrate the impact of applying different statistical methods to the same material. Thirty-three children underwent single-stage surgery for attic cholesteatoma during a 15-year period. Thirty patients (94%) attended a re-evaluation. During the observation period of 15 years, recurrence of cholesteatoma occurred in 10 ears. The cumulative total recurrence rate varied from 30% to 67%, depending on the statistical method applied. In conclusion, the choice of statistical method should depend on the number of patients, follow-up rates, length of the postoperative observation period and presence of censored data.

  20. A review and comparison of methods for recreating individual patient data from published Kaplan-Meier survival curves for economic evaluations: a simulation study.

    PubMed

    Wan, Xiaomin; Peng, Liubao; Li, Yuanjian

    2015-01-01

    In general, the individual patient-level data (IPD) collected in clinical trials are not available to independent researchers to conduct economic evaluations; researchers only have access to published survival curves and summary statistics. Thus, methods that use published survival curves and summary statistics to reproduce statistics for economic evaluations are essential. Four methods have been identified: two traditional methods 1) least squares method, 2) graphical method; and two recently proposed methods by 3) Hoyle and Henley, 4) Guyot et al. The four methods were first individually reviewed and subsequently assessed regarding their abilities to estimate mean survival through a simulation study. A number of different scenarios were developed that comprised combinations of various sample sizes, censoring rates and parametric survival distributions. One thousand simulated survival datasets were generated for each scenario, and all methods were applied to actual IPD. The uncertainty in the estimate of mean survival time was also captured. All methods provided accurate estimates of the mean survival time when the sample size was 500 and a Weibull distribution was used. When the sample size was 100 and the Weibull distribution was used, the Guyot et al. method was almost as accurate as the Hoyle and Henley method; however, more biases were identified in the traditional methods. When a lognormal distribution was used, the Guyot et al. method generated noticeably less bias and a more accurate uncertainty compared with the Hoyle and Henley method. The traditional methods should not be preferred because of their remarkable overestimation. When the Weibull distribution was used for a fitted model, the Guyot et al. method was almost as accurate as the Hoyle and Henley method. However, if the lognormal distribution was used, the Guyot et al. method was less biased compared with the Hoyle and Henley method.

  1. Wind power error estimation in resource assessments.

    PubMed

    Rodríguez, Osvaldo; Del Río, Jesús A; Jaramillo, Oscar A; Martínez, Manuel

    2015-01-01

    Estimating the power output is one of the elements that determine the techno-economic feasibility of a renewable project. At present, there is a need to develop reliable methods that achieve this goal, thereby contributing to wind power penetration. In this study, we propose a method for wind power error estimation based on the wind speed measurement error, probability density function, and wind turbine power curves. This method uses the actual wind speed data without prior statistical treatment based on 28 wind turbine power curves, which were fitted by Lagrange's method, to calculate the estimate wind power output and the corresponding error propagation. We found that wind speed percentage errors of 10% were propagated into the power output estimates, thereby yielding an error of 5%. The proposed error propagation complements the traditional power resource assessments. The wind power estimation error also allows us to estimate intervals for the power production leveled cost or the investment time return. The implementation of this method increases the reliability of techno-economic resource assessment studies.

  2. Wind Power Error Estimation in Resource Assessments

    PubMed Central

    Rodríguez, Osvaldo; del Río, Jesús A.; Jaramillo, Oscar A.; Martínez, Manuel

    2015-01-01

    Estimating the power output is one of the elements that determine the techno-economic feasibility of a renewable project. At present, there is a need to develop reliable methods that achieve this goal, thereby contributing to wind power penetration. In this study, we propose a method for wind power error estimation based on the wind speed measurement error, probability density function, and wind turbine power curves. This method uses the actual wind speed data without prior statistical treatment based on 28 wind turbine power curves, which were fitted by Lagrange's method, to calculate the estimate wind power output and the corresponding error propagation. We found that wind speed percentage errors of 10% were propagated into the power output estimates, thereby yielding an error of 5%. The proposed error propagation complements the traditional power resource assessments. The wind power estimation error also allows us to estimate intervals for the power production leveled cost or the investment time return. The implementation of this method increases the reliability of techno-economic resource assessment studies. PMID:26000444

  3. Stable Estimation of a Covariance Matrix Guided by Nuclear Norm Penalties

    PubMed Central

    Chi, Eric C.; Lange, Kenneth

    2014-01-01

    Estimation of a covariance matrix or its inverse plays a central role in many statistical methods. For these methods to work reliably, estimated matrices must not only be invertible but also well-conditioned. The current paper introduces a novel prior to ensure a well-conditioned maximum a posteriori (MAP) covariance estimate. The prior shrinks the sample covariance estimator towards a stable target and leads to a MAP estimator that is consistent and asymptotically efficient. Thus, the MAP estimator gracefully transitions towards the sample covariance matrix as the number of samples grows relative to the number of covariates. The utility of the MAP estimator is demonstrated in two standard applications – discriminant analysis and EM clustering – in this sampling regime. PMID:25143662

  4. Review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials

    NASA Technical Reports Server (NTRS)

    Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.

    1990-01-01

    A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semi-empirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produced predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis for fully-dense materials are in good agreement with those calculated from elastic properties.

  5. New method for estimating low-earth-orbit collision probabilities

    NASA Technical Reports Server (NTRS)

    Vedder, John D.; Tabor, Jill L.

    1991-01-01

    An unconventional but general method is described for estimating the probability of collision between an earth-orbiting spacecraft and orbital debris. This method uses a Monte Caralo simulation of the orbital motion of the target spacecraft and each discrete debris object to generate an empirical set of distances, each distance representing the separation between the spacecraft and the nearest debris object at random times. Using concepts from the asymptotic theory of extreme order statistics, an analytical density function is fitted to this set of minimum distances. From this function, it is possible to generate realistic collision estimates for the spacecraft.

  6. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  7. Estimation of the POD function and the LOD of a qualitative microbiological measurement method.

    PubMed

    Wilrich, Cordula; Wilrich, Peter-Theodor

    2009-01-01

    Qualitative microbiological measurement methods in which the measurement results are either 0 (microorganism not detected) or 1 (microorganism detected) are discussed. The performance of such a measurement method is described by its probability of detection as a function of the contamination (CFU/g or CFU/mL) of the test material, or by the LOD(p), i.e., the contamination that is detected (measurement result 1) with a specified probability p. A complementary log-log model was used to statistically estimate these performance characteristics. An intralaboratory experiment for the detection of Listeria monocytogenes in various food matrixes illustrates the method. The estimate of LOD50% is compared with the Spearman-Kaerber method.

  8. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  9. Calibrating random forests for probability estimation.

    PubMed

    Dankowski, Theresa; Ziegler, Andreas

    2016-09-30

    Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  10. Application of Transformations in Parametric Inference

    ERIC Educational Resources Information Center

    Brownstein, Naomi; Pensky, Marianna

    2008-01-01

    The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…

  11. Evidence and Clinical Trials.

    NASA Astrophysics Data System (ADS)

    Goodman, Steven N.

    1989-11-01

    This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.

  12. Parameter estimation and order selection for an empirical model of VO2 on-kinetics.

    PubMed

    Alata, O; Bernard, O

    2007-04-27

    In humans, VO2 on-kinetics are noisy numerical signals that reflect the pulmonary oxygen exchange kinetics at the onset of exercise. They are empirically modelled as a sum of an offset and delayed exponentials. The number of delayed exponentials; i.e. the order of the model, is commonly supposed to be 1 for low-intensity exercises and 2 for high-intensity exercises. As no ground truth has ever been provided to validate these postulates, physiologists still need statistical methods to verify their hypothesis about the number of exponentials of the VO2 on-kinetics especially in the case of high-intensity exercises. Our objectives are first to develop accurate methods for estimating the parameters of the model at a fixed order, and then, to propose statistical tests for selecting the appropriate order. In this paper, we provide, on simulated Data, performances of Simulated Annealing for estimating model parameters and performances of Information Criteria for selecting the order. These simulated Data are generated with both single-exponential and double-exponential models, and noised by white and Gaussian noise. The performances are given at various Signal to Noise Ratio (SNR). Considering parameter estimation, results show that the confidences of estimated parameters are improved by increasing the SNR of the response to be fitted. Considering model selection, results show that Information Criteria are adapted statistical criteria to select the number of exponentials.

  13. A Balanced Approach to Adaptive Probability Density Estimation.

    PubMed

    Kovacs, Julio A; Helmick, Cailee; Wriggers, Willy

    2017-01-01

    Our development of a Fast (Mutual) Information Matching (FIM) of molecular dynamics time series data led us to the general problem of how to accurately estimate the probability density function of a random variable, especially in cases of very uneven samples. Here, we propose a novel Balanced Adaptive Density Estimation (BADE) method that effectively optimizes the amount of smoothing at each point. To do this, BADE relies on an efficient nearest-neighbor search which results in good scaling for large data sizes. Our tests on simulated data show that BADE exhibits equal or better accuracy than existing methods, and visual tests on univariate and bivariate experimental data show that the results are also aesthetically pleasing. This is due in part to the use of a visual criterion for setting the smoothing level of the density estimate. Our results suggest that BADE offers an attractive new take on the fundamental density estimation problem in statistics. We have applied it on molecular dynamics simulations of membrane pore formation. We also expect BADE to be generally useful for low-dimensional applications in other statistical application domains such as bioinformatics, signal processing and econometrics.

  14. An M-estimator for reduced-rank system identification.

    PubMed

    Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S; Vogelstein, Joshua T

    2017-01-15

    High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ 1 and ℓ 2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models.

  15. An M-estimator for reduced-rank system identification

    PubMed Central

    Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S.; Vogelstein, Joshua T.

    2018-01-01

    High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ1 and ℓ2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models. PMID:29391659

  16. A comparison of approaches for estimating bottom-sediment mass in large reservoirs

    USGS Publications Warehouse

    Juracek, Kyle E.

    2006-01-01

    Estimates of sediment and sediment-associated constituent loads and yields from drainage basins are necessary for the management of reservoir-basin systems to address important issues such as reservoir sedimentation and eutrophication. One method for the estimation of loads and yields requires a determination of the total mass of sediment deposited in a reservoir. This method involves a sediment volume-to-mass conversion using bulk-density information. A comparison of four computational approaches (partition, mean, midpoint, strategic) for using bulk-density information to estimate total bottom-sediment mass in four large reservoirs indicated that the differences among the approaches were not statistically significant. However, the lack of statistical significance may be a result of the small sample size. Compared to the partition approach, which was presumed to provide the most accurate estimates of bottom-sediment mass, the results achieved using the strategic, mean, and midpoint approaches differed by as much as ?4, ?20, and ?44 percent, respectively. It was concluded that the strategic approach may merit further investigation as a less time consuming and less costly alternative to the partition approach.

  17. Detecting changes in ultrasound backscattered statistics by using Nakagami parameters: Comparisons of moment-based and maximum likelihood estimators.

    PubMed

    Lin, Jen-Jen; Cheng, Jung-Yu; Huang, Li-Fei; Lin, Ying-Hsiu; Wan, Yung-Liang; Tsui, Po-Hsiang

    2017-05-01

    The Nakagami distribution is an approximation useful to the statistics of ultrasound backscattered signals for tissue characterization. Various estimators may affect the Nakagami parameter in the detection of changes in backscattered statistics. In particular, the moment-based estimator (MBE) and maximum likelihood estimator (MLE) are two primary methods used to estimate the Nakagami parameters of ultrasound signals. This study explored the effects of the MBE and different MLE approximations on Nakagami parameter estimations. Ultrasound backscattered signals of different scatterer number densities were generated using a simulation model, and phantom experiments and measurements of human liver tissues were also conducted to acquire real backscattered echoes. Envelope signals were employed to estimate the Nakagami parameters by using the MBE, first- and second-order approximations of MLE (MLE 1 and MLE 2 , respectively), and Greenwood approximation (MLE gw ) for comparisons. The simulation results demonstrated that, compared with the MBE and MLE 1 , the MLE 2 and MLE gw enabled more stable parameter estimations with small sample sizes. Notably, the required data length of the envelope signal was 3.6 times the pulse length. The phantom and tissue measurement results also showed that the Nakagami parameters estimated using the MLE 2 and MLE gw could simultaneously differentiate various scatterer concentrations with lower standard deviations and reliably reflect physical meanings associated with the backscattered statistics. Therefore, the MLE 2 and MLE gw are suggested as estimators for the development of Nakagami-based methodologies for ultrasound tissue characterization. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Robust estimation approach for blind denoising.

    PubMed

    Rabie, Tamer

    2005-11-01

    This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.

  19. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a pixel spacing of 40 meters near Prydz Bay area, East Antarctica. Main work is listed as follows: 1) A mixture statistical distribution based CRF algorithm has been developed for leads detection from Sentinel-1A dual polarization images. 2) The assessment of the proposed mixture statistical distribution based CRF method and single distribution based CRF algorithm has been presented. 3) The preferable parameters sets including statistical distributions, the aspect ratio threshold and spatial smoothing window size have been provided. In the future, the proposed algorithm will be developed for the operational Sentinel series data sets processing due to its less time consuming cost and high accuracy in leads detection.

  20. Characterization, parameter estimation, and aircraft response statistics of atmospheric turbulence

    NASA Technical Reports Server (NTRS)

    Mark, W. D.

    1981-01-01

    A nonGaussian three component model of atmospheric turbulence is postulated that accounts for readily observable features of turbulence velocity records, their autocorrelation functions, and their spectra. Methods for computing probability density functions and mean exceedance rates of a generic aircraft response variable are developed using nonGaussian turbulence characterizations readily extracted from velocity recordings. A maximum likelihood method is developed for optimal estimation of the integral scale and intensity of records possessing von Karman transverse of longitudinal spectra. Formulas for the variances of such parameter estimates are developed. The maximum likelihood and least-square approaches are combined to yield a method for estimating the autocorrelation function parameters of a two component model for turbulence.

  1. A Methodological Approach to Small Area Estimation for the Behavioral Risk Factor Surveillance System

    PubMed Central

    Xu, Fang; Wallace, Robyn C.; Garvin, William; Greenlund, Kurt J.; Bartoli, William; Ford, Derek; Eke, Paul; Town, G. Machell

    2016-01-01

    Public health researchers have used a class of statistical methods to calculate prevalence estimates for small geographic areas with few direct observations. Many researchers have used Behavioral Risk Factor Surveillance System (BRFSS) data as a basis for their models. The aims of this study were to 1) describe a new BRFSS small area estimation (SAE) method and 2) investigate the internal and external validity of the BRFSS SAEs it produced. The BRFSS SAE method uses 4 data sets (the BRFSS, the American Community Survey Public Use Microdata Sample, Nielsen Claritas population totals, and the Missouri Census Geographic Equivalency File) to build a single weighted data set. Our findings indicate that internal and external validity tests were successful across many estimates. The BRFSS SAE method is one of several methods that can be used to produce reliable prevalence estimates in small geographic areas. PMID:27418213

  2. A fast and objective multidimensional kernel density estimation method: fastKDE

    DOE PAGES

    O'Brien, Travis A.; Kashinath, Karthik; Cavanaugh, Nicholas R.; ...

    2016-03-07

    Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchiamore » and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10 5 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.« less

  3. KERNELHR: A program for estimating animal home ranges

    USGS Publications Warehouse

    Seaman, D.E.; Griffith, B.; Powell, R.A.

    1998-01-01

    Kernel methods are state of the art for estimating animal home-range area and utilization distribution (UD). The KERNELHR program was developed to provide researchers and managers a tool to implement this extremely flexible set of methods with many variants. KERNELHR runs interactively or from the command line on any personal computer (PC) running DOS. KERNELHR provides output of fixed and adaptive kernel home-range estimates, as well as density values in a format suitable for in-depth statistical and spatial analyses. An additional package of programs creates contour files for plotting in geographic information systems (GIS) and estimates core areas of ranges.

  4. Estimation of true height: a study in population-specific methods among young South African adults.

    PubMed

    Lahner, Christen Renée; Kassier, Susanna Maria; Veldman, Frederick Johannes

    2017-02-01

    To investigate the accuracy of arm-associated height estimation methods in the calculation of true height compared with stretch stature in a sample of young South African adults. A cross-sectional descriptive design was employed. Pietermaritzburg, Westville and Durban, KwaZulu-Natal, South Africa, 2015. Convenience sample (N 900) aged 18-24 years, which included an equal number of participants from both genders (150 per gender) stratified across race (Caucasian, Black African and Indian). Continuous variables that were investigated included: (i) stretch stature; (ii) total armspan; (iii) half-armspan; (iv) half-armspan ×2; (v) demi-span; (vi) demi-span gender-specific equation; (vii) WHO equation; and (viii) WHO-adjusted equations; as well as categorization according to gender and race. Statistical analysis was conducted using IBM SPSS Statistics Version 21.0. Significant correlations were identified between gender and height estimation measurements, with males being anatomically larger than females (P<0·001). Significant differences were documented when study participants were stratified according to race and gender (P<0·001). Anatomical similarities were noted between Indians and Black Africans, whereas Caucasians were anatomically different from the other race groups. Arm-associated height estimation methods were able to estimate true height; however, each method was specific to each gender and race group. Height can be calculated by using arm-associated measurements. Although universal equations for estimating true height exist, for the enhancement of accuracy, the use of equations that are race-, gender- and population-specific should be considered.

  5. Comparing estimates of climate change impacts from process-based and statistical crop models

    NASA Astrophysics Data System (ADS)

    Lobell, David B.; Asseng, Senthold

    2017-01-01

    The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.

  6. Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales

    PubMed Central

    Williams, L. Keoki; Buu, Anne

    2017-01-01

    We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206

  7. Lifetime Prediction for Degradation of Solar Mirrors using Step-Stress Accelerated Testing (Presentation)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, J.; Elmore, R.; Kennedy, C.

    This research is to illustrate the use of statistical inference techniques in order to quantify the uncertainty surrounding reliability estimates in a step-stress accelerated degradation testing (SSADT) scenario. SSADT can be used when a researcher is faced with a resource-constrained environment, e.g., limits on chamber time or on the number of units to test. We apply the SSADT methodology to a degradation experiment involving concentrated solar power (CSP) mirrors and compare the results to a more traditional multiple accelerated testing paradigm. Specifically, our work includes: (1) designing a durability testing plan for solar mirrors (3M's new improved silvered acrylic "Solarmore » Reflector Film (SFM) 1100") through the ultra-accelerated weathering system (UAWS), (2) defining degradation paths of optical performance based on the SSADT model which is accelerated by high UV-radiant exposure, and (3) developing service lifetime prediction models for solar mirrors using advanced statistical inference. We use the method of least squares to estimate the model parameters and this serves as the basis for the statistical inference in SSADT. Several quantities of interest can be estimated from this procedure, e.g., mean-time-to-failure (MTTF) and warranty time. The methods allow for the estimation of quantities that may be of interest to the domain scientists.« less

  8. Identification of dynamic systems, theory and formulation

    NASA Technical Reports Server (NTRS)

    Maine, R. E.; Iliff, K. W.

    1985-01-01

    The problem of estimating parameters of dynamic systems is addressed in order to present the theoretical basis of system identification and parameter estimation in a manner that is complete and rigorous, yet understandable with minimal prerequisites. Maximum likelihood and related estimators are highlighted. The approach used requires familiarity with calculus, linear algebra, and probability, but does not require knowledge of stochastic processes or functional analysis. The treatment emphasizes unification of the various areas in estimation in dynamic systems is treated as a direct outgrowth of the static system theory. Topics covered include basic concepts and definitions; numerical optimization methods; probability; statistical estimators; estimation in static systems; stochastic processes; state estimation in dynamic systems; output error, filter error, and equation error methods of parameter estimation in dynamic systems, and the accuracy of the estimates.

  9. Empirical estimation of a distribution function with truncated and doubly interval-censored data and its application to AIDS studies.

    PubMed

    Sun, J

    1995-09-01

    In this paper we discuss the non-parametric estimation of a distribution function based on incomplete data for which the measurement origin of a survival time or the date of enrollment in a study is known only to belong to an interval. Also the survival time of interest itself is observed from a truncated distribution and is known only to lie in an interval. To estimate the distribution function, a simple self-consistency algorithm, a generalization of Turnbull's (1976, Journal of the Royal Statistical Association, Series B 38, 290-295) self-consistency algorithm, is proposed. This method is then used to analyze two AIDS cohort studies, for which direct use of the EM algorithm (Dempster, Laird and Rubin, 1976, Journal of the Royal Statistical Association, Series B 39, 1-38), which is computationally complicated, has previously been the usual method of the analysis.

  10. Non-invasive body temperature measurement of wild chimpanzees using fecal temperature decline.

    PubMed

    Jensen, Siv Aina; Mundry, Roger; Nunn, Charles L; Boesch, Christophe; Leendertz, Fabian H

    2009-04-01

    New methods are required to increase our understanding of pathologic processes in wild mammals. We developed a noninvasive field method to estimate the body temperature of wild living chimpanzees habituated to humans, based on statistically fitting temperature decline of feces after defecation. The method was established with the use of control measures of human rectal temperature and subsequent changes in fecal temperature over time. The method was then applied to temperature data collected from wild chimpanzee feces. In humans, we found good correspondence between the temperature estimated by the method and the actual rectal temperature that was measured (maximum deviation 0.22 C). The method was successfully applied and the average estimated temperature of the chimpanzees was 37.2 C. This simple-to-use field method reliably estimates the body temperature of wild chimpanzees and probably also other large mammals.

  11. Sources of international migration statistics in Africa.

    PubMed

    1984-01-01

    The sources of international migration data for Africa may be classified into 2 main categories: administrative records and 2) censuses and survey data. Both categories are sources for the direct measurement of migration, but the 2nd category can be used for the indirect estimation of net international migration. The administrative records from which data on international migration may be derived include 1) entry/departure cards or forms completed at international borders, 2) residence/work permits issued to aliens, and 3) general population registers and registers of aliens. The statistics derived from the entry/departure cards may be described as 1) land frontier control statistics and 2) port control statistics. The former refer to data derived from movements across land borders and the latter refer to information collected at international airports and seaports. Other administrative records which are potential sources of statistics on international migration in some African countries include some limited population registers, records of the registration of aliens, and particulars of residence/work permits issued to aliens. Although frontier control data are considered the most important source of international migration statistics, in many African countries these data are too deficient to provide a satisfactory indication of the level of international migration. Thus decennial population censuses and/or sample surveys are the major sources of the available statistics on the stock and characteristics of international migration. Indirect methods can be used to supplement census data with intercensal estimates of net migration using census data on the total population. This indirect method of obtaining information on migration can be used to evaluate estimates derived from frontier control records, and it also offers the means of obtaining alternative information on international migration in African countries which have not directly investigated migration topics in their censuses or surveys.

  12. InSAR Tropospheric Correction Methods: A Statistical Comparison over Different Regions

    NASA Astrophysics Data System (ADS)

    Bekaert, D. P.; Walters, R. J.; Wright, T. J.; Hooper, A. J.; Parker, D. J.

    2015-12-01

    Observing small magnitude surface displacements through InSAR is highly challenging, and requires advanced correction techniques to reduce noise. In fact, one of the largest obstacles facing the InSAR community is related to tropospheric noise correction. Spatial and temporal variations in temperature, pressure, and relative humidity result in a spatially-variable InSAR tropospheric signal, which masks smaller surface displacements due to tectonic or volcanic deformation. Correction methods applied today include those relying on weather model data, GNSS and/or spectrometer data. Unfortunately, these methods are often limited by the spatial and temporal resolution of the auxiliary data. Alternatively a correction can be estimated from the high-resolution interferometric phase by assuming a linear or a power-law relationship between the phase and topography. For these methods, the challenge lies in separating deformation from tropospheric signals. We will present results of a statistical comparison of the state-of-the-art tropospheric corrections estimated from spectrometer products (MERIS and MODIS), a low and high spatial-resolution weather model (ERA-I and WRF), and both the conventional linear and power-law empirical methods. We evaluate the correction capability over Southern Mexico, Italy, and El Hierro, and investigate the impact of increasing cloud cover on the accuracy of the tropospheric delay estimation. We find that each method has its strengths and weaknesses, and suggest that further developments should aim to combine different correction methods. All the presented methods are included into our new open source software package called TRAIN - Toolbox for Reducing Atmospheric InSAR Noise (Bekaert et al., in review), which is available to the community Bekaert, D., R. Walters, T. Wright, A. Hooper, and D. Parker (in review), Statistical comparison of InSAR tropospheric correction techniques, Remote Sensing of Environment

  13. A novel approach for choosing summary statistics in approximate Bayesian computation.

    PubMed

    Aeschbacher, Simon; Beaumont, Mark A; Futschik, Andreas

    2012-11-01

    The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θ(anc) = 4N(e)u) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L(2)-loss performs best. Applying that method to the ibex data, we estimate θ(anc)≈ 1.288 and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10(-4) and 3.5 × 10(-3) per locus per generation. The proportion of males with access to matings is estimated as ω≈ 0.21, which is in good agreement with recent independent estimates.

  14. A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

    PubMed Central

    Aeschbacher, Simon; Beaumont, Mark A.; Futschik, Andreas

    2012-01-01

    The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θanc = 4Neu) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Applying that method to the ibex data, we estimate θ^anc≈1.288 and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10−4 and 3.5 × 10−3 per locus per generation. The proportion of males with access to matings is estimated as ω^≈0.21, which is in good agreement with recent independent estimates. PMID:22960215

  15. Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.

    PubMed

    Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C

    2014-01-01

    The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.

  16. Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

    PubMed Central

    Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.

    2014-01-01

    The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470

  17. Simulation methods to estimate design power: an overview for applied research

    PubMed Central

    2011-01-01

    Background Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. Methods We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. Results We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Conclusions Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research. PMID:21689447

  18. Regression without truth with Markov chain Monte-Carlo

    NASA Astrophysics Data System (ADS)

    Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Å piclin, Žiga

    2017-03-01

    Regression without truth (RWT) is a statistical technique for estimating error model parameters of each method in a group of methods used for measurement of a certain quantity. A very attractive aspect of RWT is that it does not rely on a reference method or "gold standard" data, which is otherwise difficult RWT was used for a reference-free performance comparison of several methods for measuring left ventricular ejection fraction (EF), i.e. a percentage of blood leaving the ventricle each time the heart contracts, and has since been applied for various other quantitative imaging biomarkerss (QIBs). Herein, we show how Markov chain Monte-Carlo (MCMC), a computational technique for drawing samples from a statistical distribution with probability density function known only up to a normalizing coefficient, can be used to augment RWT to gain a number of important benefits compared to the original approach based on iterative optimization. For instance, the proposed MCMC-based RWT enables the estimation of joint posterior distribution of the parameters of the error model, straightforward quantification of uncertainty of the estimates, estimation of true value of the measurand and corresponding credible intervals (CIs), does not require a finite support for prior distribution of the measureand generally has a much improved robustness against convergence to non-global maxima. The proposed approach is validated using synthetic data that emulate the EF data for 45 patients measured with 8 different methods. The obtained results show that 90% CI of the corresponding parameter estimates contain the true values of all error model parameters and the measurand. A potential real-world application is to take measurements of a certain QIB several different methods and then use the proposed framework to compute the estimates of the true values and their uncertainty, a vital information for diagnosis based on QIB.

  19. A terrain-based site characterization map of California with implications for the contiguous United States

    USGS Publications Warehouse

    Yong, Alan K.; Hough, Susan E.; Iwahashi, Junko; Braverman, Amy

    2012-01-01

    We present an approach based on geomorphometry to predict material properties and characterize site conditions using the VS30 parameter (time‐averaged shear‐wave velocity to a depth of 30 m). Our framework consists of an automated terrain classification scheme based on taxonomic criteria (slope gradient, local convexity, and surface texture) that systematically identifies 16 terrain types from 1‐km spatial resolution (30 arcsec) Shuttle Radar Topography Mission digital elevation models (SRTM DEMs). Using 853 VS30 values from California, we apply a simulation‐based statistical method to determine the mean VS30 for each terrain type in California. We then compare the VS30 values with models based on individual proxies, such as mapped surface geology and topographic slope, and show that our systematic terrain‐based approach consistently performs better than semiempirical estimates based on individual proxies. To further evaluate our model, we apply our California‐based estimates to terrains of the contiguous United States. Comparisons of our estimates with 325 VS30 measurements outside of California, as well as estimates based on the topographic slope model, indicate our method to be statistically robust and more accurate. Our approach thus provides an objective and robust method for extending estimates of VS30 for regions where in situ measurements are sparse or not readily available.

  20. Modeling, estimation and identification methods for static shape determination of flexible structures. [for large space structure design

    NASA Technical Reports Server (NTRS)

    Rodriguez, G.; Scheid, R. E., Jr.

    1986-01-01

    This paper outlines methods for modeling, identification and estimation for static determination of flexible structures. The shape estimation schemes are based on structural models specified by (possibly interconnected) elliptic partial differential equations. The identification techniques provide approximate knowledge of parameters in elliptic systems. The techniques are based on the method of maximum-likelihood that finds parameter values such that the likelihood functional associated with the system model is maximized. The estimation methods are obtained by means of a function-space approach that seeks to obtain the conditional mean of the state given the data and a white noise characterization of model errors. The solutions are obtained in a batch-processing mode in which all the data is processed simultaneously. After methods for computing the optimal estimates are developed, an analysis of the second-order statistics of the estimates and of the related estimation error is conducted. In addition to outlining the above theoretical results, the paper presents typical flexible structure simulations illustrating performance of the shape determination methods.

  1. Maine StreamStats: a water-resources web application

    USGS Publications Warehouse

    Lombard, Pamela J.

    2015-01-01

    Reports referenced in this fact sheet present the regression equations used to estimate the flow statistics, describe the errors associated with the estimates, and describe the methods used to develop the equations and to measure the basin characteristics used in the equations. Limitations of the methods are also described in the reports; for example, all of the equations are appropriate only for ungaged, unregulated, rural streams in Maine.

  2. Introduction to Geostatistics

    NASA Astrophysics Data System (ADS)

    Kitanidis, P. K.

    1997-05-01

    Introduction to Geostatistics presents practical techniques for engineers and earth scientists who routinely encounter interpolation and estimation problems when analyzing data from field observations. Requiring no background in statistics, and with a unique approach that synthesizes classic and geostatistical methods, this book offers linear estimation methods for practitioners and advanced students. Well illustrated with exercises and worked examples, Introduction to Geostatistics is designed for graduate-level courses in earth sciences and environmental engineering.

  3. Sampling-based approaches to improve estimation of mortality among patient dropouts: experience from a large PEPFAR-funded program in Western Kenya.

    PubMed

    Yiannoutsos, Constantin T; An, Ming-Wen; Frangakis, Constantine E; Musick, Beverly S; Braitstein, Paula; Wools-Kaloustian, Kara; Ochieng, Daniel; Martin, Jeffrey N; Bacon, Melanie C; Ochieng, Vincent; Kimaiyo, Sylvester

    2008-01-01

    Monitoring and evaluation (M&E) of HIV care and treatment programs is impacted by losses to follow-up (LTFU) in the patient population. The severity of this effect is undeniable but its extent unknown. Tracing all lost patients addresses this but census methods are not feasible in programs involving rapid scale-up of HIV treatment in the developing world. Sampling-based approaches and statistical adjustment are the only scaleable methods permitting accurate estimation of M&E indices. In a large antiretroviral therapy (ART) program in western Kenya, we assessed the impact of LTFU on estimating patient mortality among 8,977 adult clients of whom, 3,624 were LTFU. Overall, dropouts were more likely male (36.8% versus 33.7%; p = 0.003), and younger than non-dropouts (35.3 versus 35.7 years old; p = 0.020), with lower median CD4 count at enrollment (160 versus 189 cells/ml; p<0.001) and WHO stage 3-4 disease (47.5% versus 41.1%; p<0.001). Urban clinic clients were 75.0% of non-dropouts but 70.3% of dropouts (p<0.001). Of the 3,624 dropouts, 1,143 were sought and 621 had their vital status ascertained. Statistical techniques were used to adjust mortality estimates based on information obtained from located LTFU patients. Observed mortality estimates one year after enrollment were 1.7% (95% CI 1.3%-2.0%), revised to 2.8% (2.3%-3.1%) when deaths discovered through outreach were added and adjusted to 9.2% (7.8%-10.6%) and 9.9% (8.4%-11.5%) through statistical modeling depending on the method used. The estimates 12 months after ART initiation were 1.7% (1.3%-2.2%), 3.4% (2.9%-4.0%), 10.5% (8.7%-12.3%) and 10.7% (8.9%-12.6%) respectively. CONCLUSIONS/SIGNIFICANCE ABSTRACT: Assessment of the impact of LTFU is critical in program M&E as estimated mortality based on passive monitoring may underestimate true mortality by up to 80%. This bias can be ameliorated by tracing a sample of dropouts and statistically adjust the mortality estimates to properly evaluate and guide large HIV care and treatment programs.

  4. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  5. Simplified methods for real-time prediction of storm surge uncertainty: The city of Venice case study

    NASA Astrophysics Data System (ADS)

    Mel, Riccardo; Viero, Daniele Pietro; Carniello, Luca; Defina, Andrea; D'Alpaos, Luigi

    2014-09-01

    Providing reliable and accurate storm surge forecasts is important for a wide range of problems related to coastal environments. In order to adequately support decision-making processes, it also become increasingly important to be able to estimate the uncertainty associated with the storm surge forecast. The procedure commonly adopted to do this uses the results of a hydrodynamic model forced by a set of different meteorological forecasts; however, this approach requires a considerable, if not prohibitive, computational cost for real-time application. In the present paper we present two simplified methods for estimating the uncertainty affecting storm surge prediction with moderate computational effort. In the first approach we use a computationally fast, statistical tidal model instead of a hydrodynamic numerical model to estimate storm surge uncertainty. The second approach is based on the observation that the uncertainty in the sea level forecast mainly stems from the uncertainty affecting the meteorological fields; this has led to the idea to estimate forecast uncertainty via a linear combination of suitable meteorological variances, directly extracted from the meteorological fields. The proposed methods were applied to estimate the uncertainty in the storm surge forecast in the Venice Lagoon. The results clearly show that the uncertainty estimated through a linear combination of suitable meteorological variances nicely matches the one obtained using the deterministic approach and overcomes some intrinsic limitations in the use of a statistical tidal model.

  6. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  7. Penalized Nonlinear Least Squares Estimation of Time-Varying Parameters in Ordinary Differential Equations

    PubMed Central

    Cao, Jiguo; Huang, Jianhua Z.; Wu, Hulin

    2012-01-01

    Ordinary differential equations (ODEs) are widely used in biomedical research and other scientific areas to model complex dynamic systems. It is an important statistical problem to estimate parameters in ODEs from noisy observations. In this article we propose a method for estimating the time-varying coefficients in an ODE. Our method is a variation of the nonlinear least squares where penalized splines are used to model the functional parameters and the ODE solutions are approximated also using splines. We resort to the implicit function theorem to deal with the nonlinear least squares objective function that is only defined implicitly. The proposed penalized nonlinear least squares method is applied to estimate a HIV dynamic model from a real dataset. Monte Carlo simulations show that the new method can provide much more accurate estimates of functional parameters than the existing two-step local polynomial method which relies on estimation of the derivatives of the state function. Supplemental materials for the article are available online. PMID:23155351

  8. Towards efficient multi-scale methods for monitoring sugarcane aphid infestations in sorghum

    USDA-ARS?s Scientific Manuscript database

    We discuss approaches and issues involved with developing optimal monitoring methods for sugarcane aphid infestations (SCA) in grain sorghum. We discuss development of sequential sampling methods that allow for estimation of the number of aphids per sample unit, and statistical decision making rela...

  9. Age estimation by pulp-to-tooth area ratio using cone-beam computed tomography: A preliminary analysis

    PubMed Central

    Rai, Arpita; Acharya, Ashith B.; Naikmasur, Venkatesh G.

    2016-01-01

    Background: Age estimation of living or deceased individuals is an important aspect of forensic sciences. Conventionally, pulp-to-tooth area ratio (PTR) measured from periapical radiographs have been utilized as a nondestructive method of age estimation. Cone-beam computed tomography (CBCT) is a new method to acquire three-dimensional images of the teeth in living individuals. Aims: The present study investigated age estimation based on PTR of the maxillary canines measured in three planes obtained from CBCT image data. Settings and Design: Sixty subjects aged 20–85 years were included in the study. Materials and Methods: For each tooth, mid-sagittal, mid-coronal, and three axial sections—cementoenamel junction (CEJ), one-fourth root level from CEJ, and mid-root—were assessed. PTR was calculated using AutoCAD software after outlining the pulp and tooth. Statistical Analysis Used: All statistical analyses were performed using an SPSS 17.0 software program. Results and Conclusions: Linear regression analysis showed that only PTR in axial plane at CEJ had significant age correlation (r = 0.32; P < 0.05). This is probably because of clearer demarcation of pulp and tooth outline at this level. PMID:28123269

  10. County-level Estimates for Carbon Distribution in U.S. Croplands, 1990-2005

    DOE Data Explorer

    West, Tristram O. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2008-01-01

    Net Primary Productivity (NPP) for croplands can be estimated using a statistical method that includes factors for dry weight, harvest indices, and root:shoot ratios multiplied by yield data from the National Agricultural Statistics Service (NASS). This method has been documented and published by Prince et al. (2001), Hicke and Lobell (2004), and Hicke et al. (2004). We expanded this method by including factors for more crops and by using an estimated carbon content of 0.45 for agricultural crops to estimate (a) total net carbon uptake, (b) carbon in aboveground biomass, (c) carbon in belowground biomass, (d) carbon harvested and transported off site, and (e) the amount of carbon remaining on the surface following harvest. These five variables are included with their respective Federal Information Processing Standards (FIPS) codes for all counties in the contiguous U.S. from 1990-2005. A mean harvest efficiency of 0.95 was assumed across all crops. Total cropland NPP for the U.S. ranges from 378-527 Tg C yr-1 within years 1990-2005, and total carbon harvested and removed ranges from 161-228 Tg C yr-1 within years 1990-2005.

  11. Rapid microbiological assay of serum vitamin B12 by electronic counter

    PubMed Central

    Stuart, J.; Sklaroff, S. A.

    1966-01-01

    A new method of measuring the growth of Lactobacillus leichmannii is reported. Its adoption for the estimation of serum vitamin B12 levels shortens the incubation period required to five hours at 45°C. The method is compared statistically with a standard method of estimation, requiring incubation at 37°C., by duplicate determinations on 106 hospital patients. The significance of the apparently decreased accuracy of the new method at low serum levels is discussed, and a re-appraisal of the optimum growth temperature of Lactobacillus leichmannii suggested. PMID:5904982

  12. Rapid microbiological assay of serum vitamin B 12 by electronic counter.

    PubMed

    Stuart, J; Sklaroff, S A

    1966-01-01

    A new method of measuring the growth of Lactobacillus leichmannii is reported. Its adoption for the estimation of serum vitamin B(12) levels shortens the incubation period required to five hours at 45 degrees C. The method is compared statistically with a standard method of estimation, requiring incubation at 37 degrees C., by duplicate determinations on 106 hospital patients. The significance of the apparently decreased accuracy of the new method at low serum levels is discussed, and a re-appraisal of the optimum growth temperature of Lactobacillus leichmannii suggested.

  13. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  14. Use of an OSSE to Evaluate Background Error Covariances Estimated by the 'NMC Method'

    NASA Technical Reports Server (NTRS)

    Errico, Ronald M.; Prive, Nikki C.; Gu, Wei

    2014-01-01

    The NMC method has proven utility for prescribing approximate background-error covariances required by variational data assimilation systems. Here, untunedNMCmethod estimates are compared with explicitly determined error covariances produced within an OSSE context by exploiting availability of the true simulated states. Such a comparison provides insights into what kind of rescaling is required to render the NMC method estimates usable. It is shown that rescaling of variances and directional correlation lengths depends greatly on both pressure and latitude. In particular, some scaling coefficients appropriate in the Tropics are the reciprocal of those in the Extratropics. Also, the degree of dynamic balance is grossly overestimated by the NMC method. These results agree with previous examinations of the NMC method which used ensembles as an alternative for estimating background-error statistics.

  15. Estimating discharge measurement uncertainty using the interpolated variance estimator

    USGS Publications Warehouse

    Cohn, T.; Kiang, J.; Mason, R.

    2012-01-01

    Methods for quantifying the uncertainty in discharge measurements typically identify various sources of uncertainty and then estimate the uncertainty from each of these sources by applying the results of empirical or laboratory studies. If actual measurement conditions are not consistent with those encountered in the empirical or laboratory studies, these methods may give poor estimates of discharge uncertainty. This paper presents an alternative method for estimating discharge measurement uncertainty that uses statistical techniques and at-site observations. This Interpolated Variance Estimator (IVE) estimates uncertainty based on the data collected during the streamflow measurement and therefore reflects the conditions encountered at the site. The IVE has the additional advantage of capturing all sources of random uncertainty in the velocity and depth measurements. It can be applied to velocity-area discharge measurements that use a velocity meter to measure point velocities at multiple vertical sections in a channel cross section.

  16. Estimated Accuracy of Three Common Trajectory Statistical Methods

    NASA Technical Reports Server (NTRS)

    Kabashnikov, Vitaliy P.; Chaikovsky, Anatoli P.; Kucsera, Tom L.; Metelskaya, Natalia S.

    2011-01-01

    Three well-known trajectory statistical methods (TSMs), namely concentration field (CF), concentration weighted trajectory (CWT), and potential source contribution function (PSCF) methods were tested using known sources and artificially generated data sets to determine the ability of TSMs to reproduce spatial distribution of the sources. In the works by other authors, the accuracy of the trajectory statistical methods was estimated for particular species and at specified receptor locations. We have obtained a more general statistical estimation of the accuracy of source reconstruction and have found optimum conditions to reconstruct source distributions of atmospheric trace substances. Only virtual pollutants of the primary type were considered. In real world experiments, TSMs are intended for application to a priori unknown sources. Therefore, the accuracy of TSMs has to be tested with all possible spatial distributions of sources. An ensemble of geographical distributions of virtual sources was generated. Spearman s rank order correlation coefficient between spatial distributions of the known virtual and the reconstructed sources was taken to be a quantitative measure of the accuracy. Statistical estimates of the mean correlation coefficient and a range of the most probable values of correlation coefficients were obtained. All the TSMs that were considered here showed similar close results. The maximum of the ratio of the mean correlation to the width of the correlation interval containing the most probable correlation values determines the optimum conditions for reconstruction. An optimal geographical domain roughly coincides with the area supplying most of the substance to the receptor. The optimal domain s size is dependent on the substance decay time. Under optimum reconstruction conditions, the mean correlation coefficients can reach 0.70 0.75. The boundaries of the interval with the most probable correlation values are 0.6 0.9 for the decay time of 240 h and 0.5 0.95 for the decay time of 12 h. The best results of source reconstruction can be expected for the trace substances with a decay time on the order of several days. Although the methods considered in this paper do not guarantee high accuracy they are computationally simple and fast. Using the TSMs in optimum conditions and taking into account the range of uncertainties, one can obtain a first hint on potential source areas.

  17. Visual field progression in glaucoma: estimating the overall significance of deterioration with permutation analyses of pointwise linear regression (PoPLR).

    PubMed

    O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H

    2012-10-01

    To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.

  18. Refined shape model fitting methods for detecting various types of phenological information on major U.S. crops

    NASA Astrophysics Data System (ADS)

    Sakamoto, Toshihiro

    2018-04-01

    Crop phenological information is a critical variable in evaluating the influence of environmental stress on the final crop yield in spatio-temporal dimensions. Although the MODIS (Moderate Resolution Imaging Spectroradiometer) Land Cover Dynamics product (MCD12Q2) is widely used in place of crop phenological information, the definitions of MCD12Q2-derived phenological events (e.g. green-up date, dormancy date) were not completely consistent with those of crop development stages used in statistical surveys (e.g. emerged date, harvested date). It has been necessary to devise an alternative method focused on detecting continental-scale crop developmental stages using a different approach. Therefore, this study aimed to refine the Shape Model Fitting (SMF) method to improve its applicability to multiple major U.S. crops. The newly-refined SMF methods could estimate the timing of 36 crop-development stages of major U.S. crops, including corn, soybeans, winter wheat, spring wheat, barley, sorghum, rice, and cotton. The newly-developed calibration process did not require any long-term field observation data, and could calibrate crop-specific phenological parameters, which were used as coefficients in estimated equation, by using only freely accessible public data. The calibration of phenological parameters was conducted in two steps. In the first step, the national common phenological parameters, referred to as X0[base], were calibrated by using the statistical data of 2008. The SMF method coupled using X0[base] was named the rSMF[base] method. The second step was a further calibration to gain regionally-adjusted phenological parameters for each state, referred to as X0[local], by using additional statistical data of 2015 and 2016. The rSMF method using the X0[local] was named the rSMF[local] method. This second calibration process improved the estimation accuracy for all tested crops. When applying the rSMF[base] method to the validation data set (2009-2014), the root mean square error (RMSE) of the rSMF[base]-derived estimates ranged from 7.1 days (corn) to 15.7 days (winter wheat). When using the rSMF[local] method, the RMSE of the rSMF[local]-derived estimates improved and ranged from 5.6 days (corn) to 12.3 days (winter wheat). The results showed that the second calibration step for the rSMF[local] method could correct the region-dependent bias error between the rSMF[base]-derived estimates and the statistical data. A comparison between the performances of the refined SMF methods and the MCD12Q2 products, indicated that both of the rSMF methods were superior to the MCD12Q2 products in estimating all phenological stages, except for the case of the rSMF[base]-derived barley emerged stages. The phenological stages for which the rSMF[local] showed the best estimation accuracy were the corn silking stage (RMSE = 4.3 days); the soybeans dropping leaves stage (RMSE = 4.9 days); the headed stages of winter wheat (RMSE = 11.1 days), barley (RMSE = 6.1 days), and sorghum (RMSE = 9.5 days); the spring-wheat harvested stage (RMSE = 5.5 days); the rice emerged stage (RMSE = 5.5 days), and the cotton squaring stage (RMSE = 6.6 days). These were more accurate than the results achieved by the MCD12Q2 products. In addition, the rSMF[local]-derived estimates were superior in terms of the reproducibility of the annual variation range, particularly of the late reproductive stages, such as the mature and harvested stages. The crop phenology maps derived from the SMF [local] method were also in good agreement with the relevant maps derived from statistics, and could reveal the characteristic spatial pattern of the key phenological stages at the continental scale with fine spatial resolution. For example, the winter-wheat headed stage clearly became later from south to north. The cotton squaring stage became earlier from the central region towards both coastal regions.

  19. Free energy surfaces from nonequilibrium processes without work measurement

    NASA Astrophysics Data System (ADS)

    Adib, Artur B.

    2006-04-01

    Recent developments in statistical mechanics have allowed the estimation of equilibrium free energies from the statistics of work measurements during processes that drive the system out of equilibrium. Here a different class of processes is considered, wherein the system is prepared and released from a nonequilibrium state, and no external work is involved during its observation. For such "clamp-and-release" processes, a simple strategy for the estimation of equilibrium free energies is offered. The method is illustrated with numerical simulations and analyzed in the context of tethered single-molecule experiments.

  20. Seed Dispersal Near and Far: Patterns Across Temperate and Tropical Forests

    Treesearch

    James S. Clark; Miles Silman; Ruth Kern; Eric Macklin; Janneke HilleRisLambers

    1999-01-01

    Dispersal affects community dynamics and vegetation response to global change. Understanding these effects requires descriptions of dispersal at local and regional scales and statistical models that permit estimation. Classical models of dispersal describe local or long-distance dispersal, but not both. The lack of statistical methods means that models have rarely been...

  1. Analysis of First-Term Attrition of Non-Prior Service High-Quality U.S. Army Male Recruits

    DTIC Science & Technology

    1989-12-13

    the estimators. Under broad conditions ( Hanushek , 1977 ), the maximum likelihood estimators are: ( a ) consistent (b) asymptotically efficient, and...Diseases, Vol. 24, 1971, pp. 125 - 158. Hanushek , Eric ., and John E. Jackson, Statistical Methods for Social Scientists, Academic Press, New York, 1977

  2. Statistical control in hydrologic forecasting.

    Treesearch

    H.G. Wilm

    1950-01-01

    With rapidly growing development and uses of water, a correspondingly great demand has developed for advance estimates of the volumes or rates of flow which are supplied by streams. Therefore much attention is being devoted to hydrologic forecasting, and numerous methods have been tested in efforts to make increasingly reliable estimates of future supplies.

  3. Specifying and Refining a Complex Measurement Model.

    ERIC Educational Resources Information Center

    Levy, Roy; Mislevy, Robert J.

    This paper aims to describe a Bayesian approach to modeling and estimating cognitive models both in terms of statistical machinery and actual instrument development. Such a method taps the knowledge of experts to provide initial estimates for the probabilistic relationships among the variables in a multivariate latent variable model and refines…

  4. Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

    NASA Technical Reports Server (NTRS)

    Tripp, John S.; Tcheng, Ping

    1999-01-01

    Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.

  5. Risk analysis in cohort studies with heterogeneous strata. A global chi2-test for dose-response relationship, generalizing the Mantel-Haenszel procedure.

    PubMed

    Ahlborn, W; Tuz, H J; Uberla, K

    1990-03-01

    In cohort studies the Mantel-Haenszel estimator ORMH is computed from sample data and is used as a point estimator of relative risk. Test-based confidence intervals are estimated with the help of the asymptotic chi-squared distributed MH-statistic chi 2MHS. The Mantel-extension-chi-squared is used as a test statistic for a dose-response relationship. Both test statistics--the Mantel-Haenszel-chi as well as the Mantel-extension-chi--assume homogeneity of risk across strata, which is rarely present. Also an extended nonparametric statistic, proposed by Terpstra, which is based on the Mann-Whitney-statistics assumes homogeneity of risk across strata. We have earlier defined four risk measures RRkj (k = 1,2,...,4) in the population and considered their estimates and the corresponding asymptotic distributions. In order to overcome the homogeneity assumption we use the delta-method to get "test-based" confidence intervals. Because the four risk measures RRkj are presented as functions of four weights gik we give, consequently, the asymptotic variances of these risk estimators also as functions of the weights gik in a closed form. Approximations to these variances are given. For testing a dose-response relationship we propose a new class of chi 2(1)-distributed global measures Gk and the corresponding global chi 2-test. In contrast to the Mantel-extension-chi homogeneity of risk across strata must not be assumed. These global test statistics are of the Wald type for composite hypotheses.(ABSTRACT TRUNCATED AT 250 WORDS)

  6. Estimating sunspot number

    NASA Technical Reports Server (NTRS)

    Wilson, R. M.; Reichmann, E. J.; Teuber, D. L.

    1984-01-01

    An empirical method is developed to predict certain parameters of future solar activity cycles. Sunspot cycle statistics are examined, and curve fitting and linear regression analysis techniques are utilized.

  7. Hypothesis testing for band size detection of high-dimensional banded precision matrices.

    PubMed

    An, Baiguo; Guo, Jianhua; Liu, Yufeng

    2014-06-01

    Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.

  8. Combining statistical inference and decisions in ecology.

    PubMed

    Williams, Perry J; Hooten, Mevin B

    2016-09-01

    Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem. © 2016 by the Ecological Society of America.

  9. Battery Calendar Life Estimator Manual Modeling and Simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jon P. Christophersen; Ira Bloom; Ed Thomas

    2012-10-01

    The Battery Life Estimator (BLE) Manual has been prepared to assist developers in their efforts to estimate the calendar life of advanced batteries for automotive applications. Testing requirements and procedures are defined by the various manuals previously published under the United States Advanced Battery Consortium (USABC). The purpose of this manual is to describe and standardize a method for estimating calendar life based on statistical models and degradation data acquired from typical USABC battery testing.

  10. Battery Life Estimator Manual Linear Modeling and Simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jon P. Christophersen; Ira Bloom; Ed Thomas

    2009-08-01

    The Battery Life Estimator (BLE) Manual has been prepared to assist developers in their efforts to estimate the calendar life of advanced batteries for automotive applications. Testing requirements and procedures are defined by the various manuals previously published under the United States Advanced Battery Consortium (USABC). The purpose of this manual is to describe and standardize a method for estimating calendar life based on statistical models and degradation data acquired from typical USABC battery testing.

  11. A statistical method to estimate low-energy hadronic cross sections

    NASA Astrophysics Data System (ADS)

    Balassa, Gábor; Kovács, Péter; Wolf, György

    2018-02-01

    In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.

  12. Inference of median difference based on the Box-Cox model in randomized clinical trials.

    PubMed

    Maruo, K; Isogawa, N; Gosho, M

    2015-05-10

    In randomized clinical trials, many medical and biological measurements are not normally distributed and are often skewed. The Box-Cox transformation is a powerful procedure for comparing two treatment groups for skewed continuous variables in terms of a statistical test. However, it is difficult to directly estimate and interpret the location difference between the two groups on the original scale of the measurement. We propose a helpful method that infers the difference of the treatment effect on the original scale in a more easily interpretable form. We also provide statistical analysis packages that consistently include an estimate of the treatment effect, covariance adjustments, standard errors, and statistical hypothesis tests. The simulation study that focuses on randomized parallel group clinical trials with two treatment groups indicates that the performance of the proposed method is equivalent to or better than that of the existing non-parametric approaches in terms of the type-I error rate and power. We illustrate our method with cluster of differentiation 4 data in an acquired immune deficiency syndrome clinical trial. Copyright © 2015 John Wiley & Sons, Ltd.

  13. A method for automatic feature points extraction of human vertebrae three-dimensional model

    NASA Astrophysics Data System (ADS)

    Wu, Zhen; Wu, Junsheng

    2017-05-01

    A method for automatic extraction of the feature points of the human vertebrae three-dimensional model is presented. Firstly, the statistical model of vertebrae feature points is established based on the results of manual vertebrae feature points extraction. Then anatomical axial analysis of the vertebrae model is performed according to the physiological and morphological characteristics of the vertebrae. Using the axial information obtained from the analysis, a projection relationship between the statistical model and the vertebrae model to be extracted is established. According to the projection relationship, the statistical model is matched with the vertebrae model to get the estimated position of the feature point. Finally, by analyzing the curvature in the spherical neighborhood with the estimated position of feature points, the final position of the feature points is obtained. According to the benchmark result on multiple test models, the mean relative errors of feature point positions are less than 5.98%. At more than half of the positions, the error rate is less than 3% and the minimum mean relative error is 0.19%, which verifies the effectiveness of the method.

  14. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra

    PubMed Central

    Shin, S M; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    Objectives: To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. Methods: The sample included 24 female and 19 male patients with hand–wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Results: Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Conclusions: Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index. PMID:25411713

  15. Ever Enrolled Medicare Population Estimates from the MCBS Access to Care Files

    PubMed Central

    Petroski, Jason; Ferraro, David; Chu, Adam

    2014-01-01

    Objective The Medicare Current Beneficiary Survey’s (MCBS) Access to Care (ATC) file is designed to provide timely access to information on the Medicare population, yet because of the survey’s complex sampling design and expedited processing it is difficult to use the file to make both “always-enrolled” and “ever-enrolled” estimates on the Medicare population. In this study, we describe the ATC file and sample design, and we evaluate and review various alternatives for producing “ever-enrolled” estimates. Methods We created “ever enrolled” estimates for key variables in the MCBS using three separate approaches. We tested differences between the alternative approaches for statistical significance and show the relative magnitude of difference between approaches. Results Even when estimates derived from the different approaches were statistically different, the magnitude of the difference was often sufficiently small so as to result in little practical difference among the alternate approaches. However, when considering more than just the estimation method, there are advantages to using certain approaches over others. Conclusion There are several plausible approaches to achieving “ever-enrolled” estimates in the MCBS ATC file; however, the most straightforward approach appears to be implementation and usage of a new set of “ever-enrolled” weights for this file. PMID:24991484

  16. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods.

    PubMed

    Vizcaíno, Iván P; Carrera, Enrique V; Muñoz-Romero, Sergio; Cumbal, Luis H; Rojo-Álvarez, José Luis

    2017-10-16

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer's kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer's kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem.

  17. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods

    PubMed Central

    Vizcaíno, Iván P.; Muñoz-Romero, Sergio; Cumbal, Luis H.

    2017-01-01

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer’s kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer’s kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem. PMID:29035333

  18. Surveying Europe's Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA.

    PubMed

    Vörös, Judit; Márton, Orsolya; Schmidt, Benedikt R; Gál, Júlia Tünde; Jelić, Dušan

    2017-01-01

    In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence.

  19. Epidemiologic programs for computers and calculators. A microcomputer program for multiple logistic regression by unconditional and conditional maximum likelihood methods.

    PubMed

    Campos-Filho, N; Franco, E L

    1989-02-01

    A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.

  20. Wang-Landau Reaction Ensemble Method: Simulation of Weak Polyelectrolytes and General Acid-Base Reactions.

    PubMed

    Landsgesell, Jonas; Holm, Christian; Smiatek, Jens

    2017-02-14

    We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.

  1. Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

    PubMed Central

    Han, Buhm; Kang, Hyun Min; Eskin, Eleazar

    2009-01-01

    With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255

  2. Statistical estimation of the potential possibilities for panoramic hydro-optic laser sensing

    NASA Astrophysics Data System (ADS)

    Shamanaev, Vitalii S.; Lisenko, Andrey A.

    2017-11-01

    For statistical estimation of the potential possibilities of the lidar with matrix photodetector placed on board an aircraft, the nonstationary equation of laser sensing of a complex multicomponent sea water medium is solved by the Monte Carlo method. The lidar return power is estimated for various optical sea water characteristics in the presence of solar background radiation. For clear waters and brightness of external background illumination of 50, 1, and 10-3 W/(m2ṡμmṡsr), the signal/noise ratio (SNR) exceeds 10 to water depths h = 45-50 m. For coastal waters, SNR >= 10 for h = 17-24 m, whereas for turbid sea waters, SNR >= 10 only to depths h = 8-12 m. Results of statistical simulation have shown that the lidar system with optimal parameters can be used for water sensing to depths of 50 m.

  3. Baseline estimation in flame's spectra by using neural networks and robust statistics

    NASA Astrophysics Data System (ADS)

    Garces, Hugo; Arias, Luis; Rojas, Alejandro

    2014-09-01

    This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.

  4. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  5. Estimating Water Supply Arsenic Levels in the New England Bladder Cancer Study

    PubMed Central

    Freeman, Laura E. Beane; Lubin, Jay H.; Airola, Matthew S.; Baris, Dalsu; Ayotte, Joseph D.; Taylor, Anne; Paulu, Chris; Karagas, Margaret R.; Colt, Joanne; Ward, Mary H.; Huang, An-Tsun; Bress, William; Cherala, Sai; Silverman, Debra T.; Cantor, Kenneth P.

    2011-01-01

    Background: Ingestion of inorganic arsenic in drinking water is recognized as a cause of bladder cancer when levels are relatively high (≥ 150 µg/L). The epidemiologic evidence is less clear at the low-to-moderate concentrations typically observed in the United States. Accurate retrospective exposure assessment over a long time period is a major challenge in conducting epidemiologic studies of environmental factors and diseases with long latency, such as cancer. Objective: We estimated arsenic concentrations in the water supplies of 2,611 participants in a population-based case–control study in northern New England. Methods: Estimates covered the lifetimes of most study participants and were based on a combination of arsenic measurements at the homes of the participants and statistical modeling of arsenic concentrations in the water supply of both past and current homes. We assigned a residential water supply arsenic concentration for 165,138 (95%) of the total 173,361 lifetime exposure years (EYs) and a workplace water supply arsenic level for 85,195 EYs (86% of reported occupational years). Results: Three methods accounted for 93% of the residential estimates of arsenic concentration: direct measurement of water samples (27%; median, 0.3 µg/L; range, 0.1–11.5), statistical models of water utility measurement data (49%; median, 0.4 µg/L; range, 0.3–3.3), and statistical models of arsenic concentrations in wells using aquifers in New England (17%; median, 1.6 µg/L; range, 0.6–22.4). Conclusions: We used a different validation procedure for each of the three methods, and found our estimated levels to be comparable with available measured concentrations. This methodology allowed us to calculate potential drinking water exposure over long periods. PMID:21421449

  6. Estimating water supply arsenic levels in the New England bladder cancer study

    USGS Publications Warehouse

    Nuckols, J.R.; Beane, Freeman L.E.; Lubin, J.H.; Airola, M.S.; Baris, D.; Ayotte, J.D.; Taylor, A.; Paulu, C.; Karagas, M.R.; Colt, J.; Ward, M.H.; Huang, A.-T.; Bress, W.; Cherala, S.; Silverman, D.T.; Cantor, K.P.

    2011-01-01

    Background: Ingestion of inorganic arsenic in drinking water is recognized as a cause of bladder cancer when levels are relatively high (??? 150 ??g/L). The epidemiologic evidence is less clear at the low-to-moderate concentrations typically observed in the United States. Accurate retrospective exposure assessment over a long time period is a major challenge in conducting epidemiologic studies of environmental factors and diseases with long latency, such as cancer. Objective: We estimated arsenic concentrations in the water supplies of 2,611 participants in a population-based case-control study in northern New England. Methods: Estimates covered the lifetimes of most study participants and were based on a combination of arsenic measurements at the homes of the participants and statistical modeling of arsenic concentrations in the water supply of both past and current homes. We assigned a residential water supply arsenic concentration for 165,138 (95%) of the total 173,361 lifetime exposure years (EYs) and a workplace water supply arsenic level for 85,195 EYs (86% of reported occupational years). Results: Three methods accounted for 93% of the residential estimates of arsenic concentration: direct measurement of water samples (27%; median, 0.3 ??g/L; range, 0.1-11.5), statistical models of water utility measurement data (49%; median, 0.4 ??g/L; range, 0.3-3.3), and statistical models of arsenic concentrations in wells using aquifers in New England (17%; median, 1.6 ??g/L; range, 0.6-22.4). Conclusions: We used a different validation procedure for each of the three methods, and found our estimated levels to be comparable with available measured concentrations. This methodology allowed us to calculate potential drinking water exposure over long periods.

  7. Feasibility of an estimated method using graduated utensils to estimate food portion size in infants aged 4 to 18 months.

    PubMed

    Bradley, Jennifer; West-Sadler, Sarah; Foster, Emma; Sommerville, Jill; Allen, Rachel; Stephen, Alison M; Adamson, Ashley J

    2018-01-01

    The Diet and Nutrition Survey of Infants and Young Children (DNSIYC) was carried out in 2011 to assess the nutrient intakes of 4 to 18 month old infants in the UK. Prior to the main stage of DNSIYC, pilot work was undertaken to determine the impact of using graduated utensils to estimate portion sizes. The aims were to assess whether the provision of graduated utensils altered either the foods given to infants or the amount consumed by comparing estimated intakes to weighed intakes. Parents completed two 4-day food diaries over a two week period; an estimated diary using graduated utensils and a weighed diary. Two estimated diary formats were tested; half the participants completed estimated diaries in which they recorded the amount of food/drink served and the amount left over, and the other half recorded the amount of food/drink consumed only. Median daily food intake for the estimated and the weighed method were similar; 980g and 928g respectively. There was a small (6.6%) but statistically significant difference in energy intake reported by the estimated and the weighed method; 3189kJ and 2978kJ respectively. There were no statistically significant differences between estimated intakes from the served and left over diaries and weighed intakes (p>0.05). Estimated intakes from the amount consumed diaries were significantly different to weighed intakes (food weight (g) p = 0.02; energy (kJ) p = 0.01). There were no differences in intakes of amorphous (foods which take the shape of the container, e.g. pureed foods, porridge) and discrete food items (individual pieces of food e.g. biscuits, rice cakes) between the two methods. The results suggest that the household measures approach to reporting portion size, with the combined use of the graduated utensils, and recording the amount served and the amount left over in the food diaries, may provide a feasible alternative to weighed intakes.

  8. Feasibility of an estimated method using graduated utensils to estimate food portion size in infants aged 4 to 18 months

    PubMed Central

    West-Sadler, Sarah; Foster, Emma; Sommerville, Jill; Allen, Rachel; Stephen, Alison M.; Adamson, Ashley J.

    2018-01-01

    The Diet and Nutrition Survey of Infants and Young Children (DNSIYC) was carried out in 2011 to assess the nutrient intakes of 4 to 18 month old infants in the UK. Prior to the main stage of DNSIYC, pilot work was undertaken to determine the impact of using graduated utensils to estimate portion sizes. The aims were to assess whether the provision of graduated utensils altered either the foods given to infants or the amount consumed by comparing estimated intakes to weighed intakes. Parents completed two 4-day food diaries over a two week period; an estimated diary using graduated utensils and a weighed diary. Two estimated diary formats were tested; half the participants completed estimated diaries in which they recorded the amount of food/drink served and the amount left over, and the other half recorded the amount of food/drink consumed only. Median daily food intake for the estimated and the weighed method were similar; 980g and 928g respectively. There was a small (6.6%) but statistically significant difference in energy intake reported by the estimated and the weighed method; 3189kJ and 2978kJ respectively. There were no statistically significant differences between estimated intakes from the served and left over diaries and weighed intakes (p>0.05). Estimated intakes from the amount consumed diaries were significantly different to weighed intakes (food weight (g) p = 0.02; energy (kJ) p = 0.01). There were no differences in intakes of amorphous (foods which take the shape of the container, e.g. pureed foods, porridge) and discrete food items (individual pieces of food e.g. biscuits, rice cakes) between the two methods. The results suggest that the household measures approach to reporting portion size, with the combined use of the graduated utensils, and recording the amount served and the amount left over in the food diaries, may provide a feasible alternative to weighed intakes. PMID:29879140

  9. Quantifying and reducing statistical uncertainty in sample-based health program costing studies in low- and middle-income countries.

    PubMed

    Rivera-Rodriguez, Claudia L; Resch, Stephen; Haneuse, Sebastien

    2018-01-01

    In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of delivering established health programs but also to decrease uncertainty when the interest lies in assessing the incremental effect of an intervention. Measures of statistical uncertainty associated with survey-based estimates of program costs, such as standard errors and 95% confidence intervals, provide important contextual information for health policy decision-making and key inputs for the design of future costing studies. Such measures are often not reported, possibly because of technical challenges associated with their calculation and a lack of awareness of appropriate software. Modern statistical analysis methods for survey data, such as calibration, provide a means to exploit additional information that is readily available but was not used in the design of the study to significantly improve the estimation of total cost through the reduction of statistical uncertainty.

  10. A new method to address verification bias in studies of clinical screening tests: cervical cancer screening assays as an example.

    PubMed

    Xue, Xiaonan; Kim, Mimi Y; Castle, Philip E; Strickler, Howard D

    2014-03-01

    Studies to evaluate clinical screening tests often face the problem that the "gold standard" diagnostic approach is costly and/or invasive. It is therefore common to verify only a subset of negative screening tests using the gold standard method. However, undersampling the screen negatives can lead to substantial overestimation of the sensitivity and underestimation of the specificity of the diagnostic test. Our objective was to develop a simple and accurate statistical method to address this "verification bias." We developed a weighted generalized estimating equation approach to estimate, in a single model, the accuracy (eg, sensitivity/specificity) of multiple assays and simultaneously compare results between assays while addressing verification bias. This approach can be implemented using standard statistical software. Simulations were conducted to assess the proposed method. An example is provided using a cervical cancer screening trial that compared the accuracy of human papillomavirus and Pap tests, with histologic data as the gold standard. The proposed approach performed well in estimating and comparing the accuracy of multiple assays in the presence of verification bias. The proposed approach is an easy to apply and accurate method for addressing verification bias in studies of multiple screening methods. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Statistical estimation via convex optimization for trending and performance monitoring

    NASA Astrophysics Data System (ADS)

    Samar, Sikandar

    This thesis presents an optimization-based statistical estimation approach to find unknown trends in noisy data. A Bayesian framework is used to explicitly take into account prior information about the trends via trend models and constraints. The main focus is on convex formulation of the Bayesian estimation problem, which allows efficient computation of (globally) optimal estimates. There are two main parts of this thesis. The first part formulates trend estimation in systems described by known detailed models as a convex optimization problem. Statistically optimal estimates are then obtained by maximizing a concave log-likelihood function subject to convex constraints. We consider the problem of increasing problem dimension as more measurements become available, and introduce a moving horizon framework to enable recursive estimation of the unknown trend by solving a fixed size convex optimization problem at each horizon. We also present a distributed estimation framework, based on the dual decomposition method, for a system formed by a network of complex sensors with local (convex) estimation. Two specific applications of the convex optimization-based Bayesian estimation approach are described in the second part of the thesis. Batch estimation for parametric diagnostics in a flight control simulation of a space launch vehicle is shown to detect incipient fault trends despite the natural masking properties of feedback in the guidance and control loops. Moving horizon approach is used to estimate time varying fault parameters in a detailed nonlinear simulation model of an unmanned aerial vehicle. An excellent performance is demonstrated in the presence of winds and turbulence.

  12. Estimation of the Continuous and Discontinuous Leverage Effects

    PubMed Central

    Aït-Sahalia, Yacine; Fan, Jianqing; Laeven, Roger J. A.; Wang, Christina Dan; Yang, Xiye

    2017-01-01

    This paper examines the leverage effect, or the generally negative covariation between asset returns and their changes in volatility, under a general setup that allows the log-price and volatility processes to be Itô semimartingales. We decompose the leverage effect into continuous and discontinuous parts and develop statistical methods to estimate them. We establish the asymptotic properties of these estimators. We also extend our methods and results (for the continuous leverage) to the situation where there is market microstructure noise in the observed returns. We show in Monte Carlo simulations that our estimators have good finite sample performance. When applying our methods to real data, our empirical results provide convincing evidence of the presence of the two leverage effects, especially the discontinuous one. PMID:29606780

  13. Estimation of the Continuous and Discontinuous Leverage Effects.

    PubMed

    Aït-Sahalia, Yacine; Fan, Jianqing; Laeven, Roger J A; Wang, Christina Dan; Yang, Xiye

    2017-01-01

    This paper examines the leverage effect, or the generally negative covariation between asset returns and their changes in volatility, under a general setup that allows the log-price and volatility processes to be Itô semimartingales. We decompose the leverage effect into continuous and discontinuous parts and develop statistical methods to estimate them. We establish the asymptotic properties of these estimators. We also extend our methods and results (for the continuous leverage) to the situation where there is market microstructure noise in the observed returns. We show in Monte Carlo simulations that our estimators have good finite sample performance. When applying our methods to real data, our empirical results provide convincing evidence of the presence of the two leverage effects, especially the discontinuous one.

  14. Multivariate regression methods for estimating velocity of ictal discharges from human microelectrode recordings

    NASA Astrophysics Data System (ADS)

    Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.

    2017-08-01

    Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically-observable EEG data, with a variety of straightforward computation methods available. This opens possibilities for systematic assessments of ictal discharge propagation in clinical and research settings.

  15. Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

    PubMed

    Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

    2010-07-15

    With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.

  16. Stochastic differential equation (SDE) model of opening gold share price of bursa saham malaysia

    NASA Astrophysics Data System (ADS)

    Hussin, F. N.; Rahman, H. A.; Bahar, A.

    2017-09-01

    Black and Scholes option pricing model is one of the most recognized stochastic differential equation model in mathematical finance. Two parameter estimation methods have been utilized for the Geometric Brownian model (GBM); historical and discrete method. The historical method is a statistical method which uses the property of independence and normality logarithmic return, giving out the simplest parameter estimation. Meanwhile, discrete method considers the function of density of transition from the process of diffusion normal log which has been derived from maximum likelihood method. These two methods are used to find the parameter estimates samples of Malaysians Gold Share Price data such as: Financial Times and Stock Exchange (FTSE) Bursa Malaysia Emas, and Financial Times and Stock Exchange (FTSE) Bursa Malaysia Emas Shariah. Modelling of gold share price is essential since fluctuation of gold affects worldwide economy nowadays, including Malaysia. It is found that discrete method gives the best parameter estimates than historical method due to the smallest Root Mean Square Error (RMSE) value.

  17. Implications of the methodological choices for hydrologic portrayals of climate change over the contiguous United States: Statistically downscaled forcing data and hydrologic models

    USGS Publications Warehouse

    Mizukami, Naoki; Clark, Martyn P.; Gutmann, Ethan D.; Mendoza, Pablo A.; Newman, Andrew J.; Nijssen, Bart; Livneh, Ben; Hay, Lauren E.; Arnold, Jeffrey R.; Brekke, Levi D.

    2016-01-01

    Continental-domain assessments of climate change impacts on water resources typically rely on statistically downscaled climate model outputs to force hydrologic models at a finer spatial resolution. This study examines the effects of four statistical downscaling methods [bias-corrected constructed analog (BCCA), bias-corrected spatial disaggregation applied at daily (BCSDd) and monthly scales (BCSDm), and asynchronous regression (AR)] on retrospective hydrologic simulations using three hydrologic models with their default parameters (the Community Land Model, version 4.0; the Variable Infiltration Capacity model, version 4.1.2; and the Precipitation–Runoff Modeling System, version 3.0.4) over the contiguous United States (CONUS). Biases of hydrologic simulations forced by statistically downscaled climate data relative to the simulation with observation-based gridded data are presented. Each statistical downscaling method produces different meteorological portrayals including precipitation amount, wet-day frequency, and the energy input (i.e., shortwave radiation), and their interplay affects estimations of precipitation partitioning between evapotranspiration and runoff, extreme runoff, and hydrologic states (i.e., snow and soil moisture). The analyses show that BCCA underestimates annual precipitation by as much as −250 mm, leading to unreasonable hydrologic portrayals over the CONUS for all models. Although the other three statistical downscaling methods produce a comparable precipitation bias ranging from −10 to 8 mm across the CONUS, BCSDd severely overestimates the wet-day fraction by up to 0.25, leading to different precipitation partitioning compared to the simulations with other downscaled data. Overall, the choice of downscaling method contributes to less spread in runoff estimates (by a factor of 1.5–3) than the choice of hydrologic model with use of the default parameters if BCCA is excluded.

  18. Statistical Models for Incorporating Data from Routine HIV Testing of Pregnant Women at Antenatal Clinics into HIV/AIDS Epidemic Estimates

    PubMed Central

    Sheng, Ben; Marsh, Kimberly; Slavkovic, Aleksandra B.; Gregson, Simon; Eaton, Jeffrey W.; Bao, Le

    2017-01-01

    Objective HIV prevalence data collected from routine HIV testing of pregnant women at antenatal clinics (ANC-RT) are potentially available from all facilities that offer testing services to pregnant women, and can be used to improve estimates of national and sub-national HIV prevalence trends. We develop methods to incorporate this new data source into the UNAIDS Estimation and Projection Package (EPP) in Spectrum 2017. Methods We develop a new statistical model for incorporating ANC-RT HIV prevalence data, aggregated either to the health facility level (‘site-level’) or regionally (‘census-level’), to estimate HIV prevalence alongside existing sources of HIV prevalence data from ANC unlinked anonymous testing (ANC-UAT) and household-based national population surveys. Synthetic data are generated to understand how the availability of ANC-RT data affects the accuracy of various parameter estimates. Results We estimate HIV prevalence and additional parameters using both ANC-RT and other existing data. Fitting HIV prevalence using synthetic data generally gives precise estimates of the underlying trend and other parameters. More years of ANC-RT data should improve prevalence estimates. More ANC-RT sites and continuation with existing ANC-UAT sites may improve the estimate of calibration between ANC-UAT and ANC-RT sites. Conclusion We have proposed methods to incorporate ANC-RT data into Spectrum to obtain more precise estimates of prevalence and other measures of the epidemic. Many assumptions about the accuracy, consistency, and representativeness of ANC-RT prevalence underlie the use of these data for monitoring HIV epidemic trends, and should be tested as more data become available from national ANC-RT programs. PMID:28296804

  19. Applications of Principled Search Methods in Climate Influences and Mechanisms

    NASA Technical Reports Server (NTRS)

    Glymour, Clark

    2005-01-01

    Forest and grass fires cause economic losses in the billions of dollars in the U.S. alone. In addition, boreal forests constitute a large carbon store; it has been estimated that, were no burning to occur, an additional 7 gigatons of carbon would be sequestered in boreal soils each century. Effective wildfire suppression requires anticipation of locales and times for which wildfire is most probable, preferably with a two to four week forecast, so that limited resources can be efficiently deployed. The United States Forest Service (USFS), and other experts and agencies have developed several measures of fire risk combining physical principles and expert judgment, and have used them in automated procedures for forecasting fire risk. Forecasting accuracies for some fire risk indices in combination with climate and other variables have been estimated for specific locations, with the value of fire risk index variables assessed by their statistical significance in regressions. In other cases, the MAPSS forecasts [23, 241 for example, forecasting accuracy has been estimated only by simulated data. We describe alternative forecasting methods that predict fire probability by locale and time using statistical or machine learning procedures trained on historical data, and we give comparative assessments of their forecasting accuracy for one fire season year, April- October, 2003, for all U.S. Forest Service lands. Aside from providing an accuracy baseline for other forecasting methods, the results illustrate the interdependence between the statistical significance of prediction variables and the forecasting method used.

  20. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

    PubMed

    Dazard, Jean-Eudes; Rao, J Sunil

    2012-07-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

  1. Estimation of parameters in rational reaction rates of molecular biological systems via weighted least squares

    NASA Astrophysics Data System (ADS)

    Wu, Fang-Xiang; Mu, Lei; Shi, Zhong-Ke

    2010-01-01

    The models of gene regulatory networks are often derived from statistical thermodynamics principle or Michaelis-Menten kinetics equation. As a result, the models contain rational reaction rates which are nonlinear in both parameters and states. It is challenging to estimate parameters nonlinear in a model although there have been many traditional nonlinear parameter estimation methods such as Gauss-Newton iteration method and its variants. In this article, we develop a two-step method to estimate the parameters in rational reaction rates of gene regulatory networks via weighted linear least squares. This method takes the special structure of rational reaction rates into consideration. That is, in the rational reaction rates, the numerator and the denominator are linear in parameters. By designing a special weight matrix for the linear least squares, parameters in the numerator and the denominator can be estimated by solving two linear least squares problems. The main advantage of the developed method is that it can produce the analytical solutions to the estimation of parameters in rational reaction rates which originally is nonlinear parameter estimation problem. The developed method is applied to a couple of gene regulatory networks. The simulation results show the superior performance over Gauss-Newton method.

  2. Spatial Estimation of Sub-Hour Global Horizontal Irradiance Based on Official Observations and Remote Sensors

    PubMed Central

    Gutierrez-Corea, Federico-Vladimir; Manso-Callejo, Miguel-Angel; Moreno-Regidor, María-Pilar; Velasco-Gómez, Jesús

    2014-01-01

    This study was motivated by the need to improve densification of Global Horizontal Irradiance (GHI) observations, increasing the number of surface weather stations that observe it, using sensors with a sub-hour periodicity and examining the methods of spatial GHI estimation (by interpolation) with that periodicity in other locations. The aim of the present research project is to analyze the goodness of 15-minute GHI spatial estimations for five methods in the territory of Spain (three geo-statistical interpolation methods, one deterministic method and the HelioSat2 method, which is based on satellite images). The research concludes that, when the work area has adequate station density, the best method for estimating GHI every 15 min is Regression Kriging interpolation using GHI estimated from satellite images as one of the input variables. On the contrary, when station density is low, the best method is estimating GHI directly from satellite images. A comparison between the GHI observed by volunteer stations and the estimation model applied concludes that 67% of the volunteer stations analyzed present values within the margin of error (average of ±2 standard deviations). PMID:24732102

  3. Spatial estimation of sub-hour Global Horizontal Irradiance based on official observations and remote sensors.

    PubMed

    Gutierrez-Corea, Federico-Vladimir; Manso-Callejo, Miguel-Angel; Moreno-Regidor, María-Pilar; Velasco-Gómez, Jesús

    2014-04-11

    This study was motivated by the need to improve densification of Global Horizontal Irradiance (GHI) observations, increasing the number of surface weather stations that observe it, using sensors with a sub-hour periodicity and examining the methods of spatial GHI estimation (by interpolation) with that periodicity in other locations. The aim of the present research project is to analyze the goodness of 15-minute GHI spatial estimations for five methods in the territory of Spain (three geo-statistical interpolation methods, one deterministic method and the HelioSat2 method, which is based on satellite images). The research concludes that, when the work area has adequate station density, the best method for estimating GHI every 15 min is Regression Kriging interpolation using GHI estimated from satellite images as one of the input variables. On the contrary, when station density is low, the best method is estimating GHI directly from satellite images. A comparison between the GHI observed by volunteer stations and the estimation model applied concludes that 67% of the volunteer stations analyzed present values within the margin of error (average of ±2 standard deviations).

  4. Estimating uncertainty in respondent-driven sampling using a tree bootstrap method.

    PubMed

    Baraff, Aaron J; McCormick, Tyler H; Raftery, Adrian E

    2016-12-20

    Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variability of these estimates has been shown to be much higher than previously acknowledged, and even methods designed to account for RDS result in misleadingly narrow confidence intervals. In this paper, we introduce a tree bootstrap method for estimating uncertainty in RDS estimates based on resampling recruitment trees. We use simulations from known social networks to show that the tree bootstrap method not only outperforms existing methods but also captures the high variability of RDS, even in extreme cases with high design effects. We also apply the method to data from injecting drug users in Ukraine. Unlike other methods, the tree bootstrap depends only on the structure of the sampled recruitment trees, not on the attributes being measured on the respondents, so correlations between attributes can be estimated as well as variability. Our results suggest that it is possible to accurately assess the high level of uncertainty inherent in RDS.

  5. Evaluation of Cepstrum Algorithm with Impact Seeded Fault Data of Helicopter Oil Cooler Fan Bearings and Machine Fault Simulator Data

    DTIC Science & Technology

    2013-02-01

    of a bearing must be put into practice. There are many potential methods, the most traditional being the use of statistical time-domain features...accelerate degradation to test multiples bearings to gain statistical relevance and extrapolate results to scale for field conditions. Temperature...as time statistics , frequency estimation to improve the fault frequency detection. For future investigations, one can further explore the

  6. Translational Research for Occupational Therapy: Using SPRE in Hippotherapy for Children with Developmental Disabilities.

    PubMed

    Weissman-Miller, Deborah; Miller, Rosalie J; Shotwell, Mary P

    2017-01-01

    Translational research is redefined in this paper using a combination of methods in statistics and data science to enhance the understanding of outcomes and practice in occupational therapy. These new methods are applied, using larger data and smaller single-subject data, to a study in hippotherapy for children with developmental disabilities (DD). The Centers for Disease Control and Prevention estimates DD affects nearly 10 million children, aged 2-19, where diagnoses may be comorbid. Hippotherapy is defined here as a treatment strategy in occupational therapy using equine movement to achieve functional outcomes. Semiparametric ratio estimator (SPRE), a single-subject statistical and small data science model, is used to derive a "change point" indicating where the participant adapts to treatment, from which predictions are made. Data analyzed here is from an institutional review board approved pilot study using the Hippotherapy Evaluation and Assessment Tool measure, where outcomes are given separately for each of four measured domains and the total scores of each participant. Analysis with SPRE, using statistical methods to predict a "change point" and data science graphical interpretations of data, shows the translational comparisons between results from larger mean values and the very different results from smaller values for each HEAT domain in terms of relationships and statistical probabilities.

  7. Translational Research for Occupational Therapy: Using SPRE in Hippotherapy for Children with Developmental Disabilities

    PubMed Central

    Miller, Rosalie J.; Shotwell, Mary P.

    2017-01-01

    Translational research is redefined in this paper using a combination of methods in statistics and data science to enhance the understanding of outcomes and practice in occupational therapy. These new methods are applied, using larger data and smaller single-subject data, to a study in hippotherapy for children with developmental disabilities (DD). The Centers for Disease Control and Prevention estimates DD affects nearly 10 million children, aged 2–19, where diagnoses may be comorbid. Hippotherapy is defined here as a treatment strategy in occupational therapy using equine movement to achieve functional outcomes. Semiparametric ratio estimator (SPRE), a single-subject statistical and small data science model, is used to derive a “change point” indicating where the participant adapts to treatment, from which predictions are made. Data analyzed here is from an institutional review board approved pilot study using the Hippotherapy Evaluation and Assessment Tool measure, where outcomes are given separately for each of four measured domains and the total scores of each participant. Analysis with SPRE, using statistical methods to predict a “change point” and data science graphical interpretations of data, shows the translational comparisons between results from larger mean values and the very different results from smaller values for each HEAT domain in terms of relationships and statistical probabilities. PMID:29097962

  8. Automatic cloud coverage assessment of Formosat-2 image

    NASA Astrophysics Data System (ADS)

    Hsu, Kuo-Hsien

    2011-11-01

    Formosat-2 satellite equips with the high-spatial-resolution (2m ground sampling distance) remote sensing instrument. It has been being operated on the daily-revisiting mission orbit by National Space organization (NSPO) of Taiwan since May 21 2004. NSPO has also serving as one of the ground receiving stations for daily processing the received Formosat- 2 images. The current cloud coverage assessment of Formosat-2 image for NSPO Image Processing System generally consists of two major steps. Firstly, an un-supervised K-means method is used for automatically estimating the cloud statistic of Formosat-2 image. Secondly, manual estimation of cloud coverage from Formosat-2 image is processed by manual examination. Apparently, a more accurate Automatic Cloud Coverage Assessment (ACCA) method certainly increases the efficiency of processing step 2 with a good prediction of cloud statistic. In this paper, mainly based on the research results from Chang et al, Irish, and Gotoh, we propose a modified Formosat-2 ACCA method which considered pre-processing and post-processing analysis. For pre-processing analysis, cloud statistic is determined by using un-supervised K-means classification, Sobel's method, Otsu's method, non-cloudy pixels reexamination, and cross-band filter method. Box-Counting fractal method is considered as a post-processing tool to double check the results of pre-processing analysis for increasing the efficiency of manual examination.

  9. Estimation of confidence limits for descriptive indexes derived from autoregressive analysis of time series: Methods and application to heart rate variability.

    PubMed

    Beda, Alessandro; Simpson, David M; Faes, Luca

    2017-01-01

    The growing interest in personalized medicine requires making inferences from descriptive indexes estimated from individual recordings of physiological signals, with statistical analyses focused on individual differences between/within subjects, rather than comparing supposedly homogeneous cohorts. To this end, methods to compute confidence limits of individual estimates of descriptive indexes are needed. This study introduces numerical methods to compute such confidence limits and perform statistical comparisons between indexes derived from autoregressive (AR) modeling of individual time series. Analytical approaches are generally not viable, because the indexes are usually nonlinear functions of the AR parameters. We exploit Monte Carlo (MC) and Bootstrap (BS) methods to reproduce the sampling distribution of the AR parameters and indexes computed from them. Here, these methods are implemented for spectral and information-theoretic indexes of heart-rate variability (HRV) estimated from AR models of heart-period time series. First, the MS and BC methods are tested in a wide range of synthetic HRV time series, showing good agreement with a gold-standard approach (i.e. multiple realizations of the "true" process driving the simulation). Then, real HRV time series measured from volunteers performing cognitive tasks are considered, documenting (i) the strong variability of confidence limits' width across recordings, (ii) the diversity of individual responses to the same task, and (iii) frequent disagreement between the cohort-average response and that of many individuals. We conclude that MC and BS methods are robust in estimating confidence limits of these AR-based indexes and thus recommended for short-term HRV analysis. Moreover, the strong inter-individual differences in the response to tasks shown by AR-based indexes evidence the need of individual-by-individual assessments of HRV features. Given their generality, MC and BS methods are promising for applications in biomedical signal processing and beyond, providing a powerful new tool for assessing the confidence limits of indexes estimated from individual recordings.

  10. Estimation of confidence limits for descriptive indexes derived from autoregressive analysis of time series: Methods and application to heart rate variability

    PubMed Central

    2017-01-01

    The growing interest in personalized medicine requires making inferences from descriptive indexes estimated from individual recordings of physiological signals, with statistical analyses focused on individual differences between/within subjects, rather than comparing supposedly homogeneous cohorts. To this end, methods to compute confidence limits of individual estimates of descriptive indexes are needed. This study introduces numerical methods to compute such confidence limits and perform statistical comparisons between indexes derived from autoregressive (AR) modeling of individual time series. Analytical approaches are generally not viable, because the indexes are usually nonlinear functions of the AR parameters. We exploit Monte Carlo (MC) and Bootstrap (BS) methods to reproduce the sampling distribution of the AR parameters and indexes computed from them. Here, these methods are implemented for spectral and information-theoretic indexes of heart-rate variability (HRV) estimated from AR models of heart-period time series. First, the MS and BC methods are tested in a wide range of synthetic HRV time series, showing good agreement with a gold-standard approach (i.e. multiple realizations of the "true" process driving the simulation). Then, real HRV time series measured from volunteers performing cognitive tasks are considered, documenting (i) the strong variability of confidence limits' width across recordings, (ii) the diversity of individual responses to the same task, and (iii) frequent disagreement between the cohort-average response and that of many individuals. We conclude that MC and BS methods are robust in estimating confidence limits of these AR-based indexes and thus recommended for short-term HRV analysis. Moreover, the strong inter-individual differences in the response to tasks shown by AR-based indexes evidence the need of individual-by-individual assessments of HRV features. Given their generality, MC and BS methods are promising for applications in biomedical signal processing and beyond, providing a powerful new tool for assessing the confidence limits of indexes estimated from individual recordings. PMID:28968394

  11. Basics of Bayesian methods.

    PubMed

    Ghosh, Sujit K

    2010-01-01

    Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.

  12. Assessing Interval Estimation Methods for Hill Model ...

    EPA Pesticide Factsheets

    The Hill model of concentration-response is ubiquitous in toxicology, perhaps because its parameters directly relate to biologically significant metrics of toxicity such as efficacy and potency. Point estimates of these parameters obtained through least squares regression or maximum likelihood are commonly used in high-throughput risk assessment, but such estimates typically fail to include reliable information concerning confidence in (or precision of) the estimates. To address this issue, we examined methods for assessing uncertainty in Hill model parameter estimates derived from concentration-response data. In particular, using a sample of ToxCast concentration-response data sets, we applied four methods for obtaining interval estimates that are based on asymptotic theory, bootstrapping (two varieties), and Bayesian parameter estimation, and then compared the results. These interval estimation methods generally did not agree, so we devised a simulation study to assess their relative performance. We generated simulated data by constructing four statistical error models capable of producing concentration-response data sets comparable to those observed in ToxCast. We then applied the four interval estimation methods to the simulated data and compared the actual coverage of the interval estimates to the nominal coverage (e.g., 95%) in order to quantify performance of each of the methods in a variety of cases (i.e., different values of the true Hill model paramet

  13. VoroMQA: Assessment of protein structure quality using interatomic contact areas.

    PubMed

    Olechnovič, Kliment; Venclovas, Česlovas

    2017-06-01

    In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  14. Statistical estimates of absenteeism attributable to seasonal and pandemic influenza from the Canadian Labour Force Survey

    PubMed Central

    2011-01-01

    Background As many respiratory viruses are responsible for influenza like symptoms, accurate measures of the disease burden are not available and estimates are generally based on statistical methods. The objective of this study was to estimate absenteeism rates and hours lost due to seasonal influenza and compare these estimates with estimates of absenteeism attributable to the two H1N1 pandemic waves that occurred in 2009. Methods Key absenteeism variables were extracted from Statistics Canada's monthly labour force survey (LFS). Absenteeism and the proportion of hours lost due to own illness or disability were modelled as a function of trend, seasonality and proxy variables for influenza activity from 1998 to 2009. Results Hours lost due to the H1N1/09 pandemic strain were elevated compared to seasonal influenza, accounting for a loss of 0.2% of potential hours worked annually. In comparison, an estimated 0.08% of hours worked annually were lost due to seasonal influenza illnesses. Absenteeism rates due to influenza were estimated at 12% per year for seasonal influenza over the 1997/98 to 2008/09 seasons, and 13% for the two H1N1/09 pandemic waves. Employees who took time off due to a seasonal influenza infection took an average of 14 hours off. For the pandemic strain, the average absence was 25 hours. Conclusions This study confirms that absenteeism due to seasonal influenza has typically ranged from 5% to 20%, with higher rates associated with multiple circulating strains. Absenteeism rates for the 2009 pandemic were similar to those occurring for seasonal influenza. Employees took more time off due to the pandemic strain than was typical for seasonal influenza. PMID:21486453

  15. MRMC analysis of agreement studies

    NASA Astrophysics Data System (ADS)

    Gallas, Brandon D.; Anam, Amrita; Chen, Weijie; Wunderlich, Adam; Zhang, Zhiwei

    2016-03-01

    The purpose of this work is to present and evaluate methods based on U-statistics to compare intra- or inter-reader agreement across different imaging modalities. We apply these methods to multi-reader multi-case (MRMC) studies. We measure reader-averaged agreement and estimate its variance accounting for the variability from readers and cases (an MRMC analysis). In our application, pathologists (readers) evaluate patient tissue mounted on glass slides (cases) in two ways. They evaluate the slides on a microscope (reference modality) and they evaluate digital scans of the slides on a computer display (new modality). In the current work, we consider concordance as the agreement measure, but many of the concepts outlined here apply to other agreement measures. Concordance is the probability that two readers rank two cases in the same order. Concordance can be estimated with a U-statistic and thus it has some nice properties: it is unbiased, asymptotically normal, and its variance is given by an explicit formula. Another property of a U-statistic is that it is symmetric in its inputs; it doesn't matter which reader is listed first or which case is listed first, the result is the same. Using this property and a few tricks while building the U-statistic kernel for concordance, we get a mathematically tractable problem and efficient software. Simulations show that our variance and covariance estimates are unbiased.

  16. Linear theory for filtering nonlinear multiscale systems with model error

    PubMed Central

    Berry, Tyrus; Harlim, John

    2014-01-01

    In this paper, we study filtering of multiscale dynamical systems with model error arising from limitations in resolving the smaller scale processes. In particular, the analysis assumes the availability of continuous-time noisy observations of all components of the slow variables. Mathematically, this paper presents new results on higher order asymptotic expansion of the first two moments of a conditional measure. In particular, we are interested in the application of filtering multiscale problems in which the conditional distribution is defined over the slow variables, given noisy observation of the slow variables alone. From the mathematical analysis, we learn that for a continuous time linear model with Gaussian noise, there exists a unique choice of parameters in a linear reduced model for the slow variables which gives the optimal filtering when only the slow variables are observed. Moreover, these parameters simultaneously give the optimal equilibrium statistical estimates of the underlying system, and as a consequence they can be estimated offline from the equilibrium statistics of the true signal. By examining a nonlinear test model, we show that the linear theory extends in this non-Gaussian, nonlinear configuration as long as we know the optimal stochastic parametrization and the correct observation model. However, when the stochastic parametrization model is inappropriate, parameters chosen for good filter performance may give poor equilibrium statistical estimates and vice versa; this finding is based on analytical and numerical results on our nonlinear test model and the two-layer Lorenz-96 model. Finally, even when the correct stochastic ansatz is given, it is imperative to estimate the parameters simultaneously and to account for the nonlinear feedback of the stochastic parameters into the reduced filter estimates. In numerical experiments on the two-layer Lorenz-96 model, we find that the parameters estimated online, as part of a filtering procedure, simultaneously produce accurate filtering and equilibrium statistical prediction. In contrast, an offline estimation technique based on a linear regression, which fits the parameters to a training dataset without using the filter, yields filter estimates which are worse than the observations or even divergent when the slow variables are not fully observed. This finding does not imply that all offline methods are inherently inferior to the online method for nonlinear estimation problems, it only suggests that an ideal estimation technique should estimate all parameters simultaneously whether it is online or offline. PMID:25002829

  17. Chasing the peak: optimal statistics for weak shear analyses

    NASA Astrophysics Data System (ADS)

    Smit, Merijn; Kuijken, Konrad

    2018-01-01

    Context. Weak gravitational lensing analyses are fundamentally limited by the intrinsic distribution of galaxy shapes. It is well known that this distribution of galaxy ellipticity is non-Gaussian, and the traditional estimation methods, explicitly or implicitly assuming Gaussianity, are not necessarily optimal. Aims: We aim to explore alternative statistics for samples of ellipticity measurements. An optimal estimator needs to be asymptotically unbiased, efficient, and robust in retaining these properties for various possible sample distributions. We take the non-linear mapping of gravitational shear and the effect of noise into account. We then discuss how the distribution of individual galaxy shapes in the observed field of view can be modeled by fitting Fourier modes to the shear pattern directly. This allows scientific analyses using statistical information of the whole field of view, instead of locally sparse and poorly constrained estimates. Methods: We simulated samples of galaxy ellipticities, using both theoretical distributions and data for ellipticities and noise. We determined the possible bias Δe, the efficiency η and the robustness of the least absolute deviations, the biweight, and the convex hull peeling (CHP) estimators, compared to the canonical weighted mean. Using these statistics for regression, we have shown the applicability of direct Fourier mode fitting. Results: We find an improved performance of all estimators, when iteratively reducing the residuals after de-shearing the ellipticity samples by the estimated shear, which removes the asymmetry in the ellipticity distributions. We show that these estimators are then unbiased in the absence of noise, and decrease noise bias by more than 30%. Our results show that the CHP estimator distribution is skewed, but still centered around the underlying shear, and its bias least affected by noise. We find the least absolute deviations estimator to be the most efficient estimator in almost all cases, except in the Gaussian case, where it's still competitive (0.83 < η < 5.1) and therefore robust. These results hold when fitting Fourier modes, where amplitudes of variation in ellipticity are determined to the order of 10-3. Conclusions: The peak of the ellipticity distribution is a direct tracer of the underlying shear and unaffected by noise, and we have shown that estimators that are sensitive to a central cusp perform more efficiently, potentially reducing uncertainties by more 0% and significantly decreasing noise bias. These results become increasingly important, as survey sizes increase and systematic issues in shape measurements decrease.

  18. Simulation methods to estimate design power: an overview for applied research.

    PubMed

    Arnold, Benjamin F; Hogan, Daniel R; Colford, John M; Hubbard, Alan E

    2011-06-20

    Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research.

  19. Rigorous Approach in Investigation of Seismic Structure and Source Characteristicsin Northeast Asia: Hierarchical and Trans-dimensional Bayesian Inversion

    NASA Astrophysics Data System (ADS)

    Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.

    2015-12-01

    Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.

  20. A theory of fine structure image models with an application to detection and classification of dementia.

    PubMed

    O'Neill, William; Penn, Richard; Werner, Michael; Thomas, Justin

    2015-06-01

    Estimation of stochastic process models from data is a common application of time series analysis methods. Such system identification processes are often cast as hypothesis testing exercises whose intent is to estimate model parameters and test them for statistical significance. Ordinary least squares (OLS) regression and the Levenberg-Marquardt algorithm (LMA) have proven invaluable computational tools for models being described by non-homogeneous, linear, stationary, ordinary differential equations. In this paper we extend stochastic model identification to linear, stationary, partial differential equations in two independent variables (2D) and show that OLS and LMA apply equally well to these systems. The method employs an original nonparametric statistic as a test for the significance of estimated parameters. We show gray scale and color images are special cases of 2D systems satisfying a particular autoregressive partial difference equation which estimates an analogous partial differential equation. Several applications to medical image modeling and classification illustrate the method by correctly classifying demented and normal OLS models of axial magnetic resonance brain scans according to subject Mini Mental State Exam (MMSE) scores. Comparison with 13 image classifiers from the literature indicates our classifier is at least 14 times faster than any of them and has a classification accuracy better than all but one. Our modeling method applies to any linear, stationary, partial differential equation and the method is readily extended to 3D whole-organ systems. Further, in addition to being a robust image classifier, estimated image models offer insights into which parameters carry the most diagnostic image information and thereby suggest finer divisions could be made within a class. Image models can be estimated in milliseconds which translate to whole-organ models in seconds; such runtimes could make real-time medicine and surgery modeling possible.

  1. Estimating fractional vegetation cover and the vegetation index of bare soil and highly dense vegetation with a physically based method

    NASA Astrophysics Data System (ADS)

    Song, Wanjuan; Mu, Xihan; Ruan, Gaiyan; Gao, Zhan; Li, Linyuan; Yan, Guangjian

    2017-06-01

    Normalized difference vegetation index (NDVI) of highly dense vegetation (NDVIv) and bare soil (NDVIs), identified as the key parameters for Fractional Vegetation Cover (FVC) estimation, are usually obtained with empirical statistical methods However, it is often difficult to obtain reasonable values of NDVIv and NDVIs at a coarse resolution (e.g., 1 km), or in arid, semiarid, and evergreen areas. The uncertainty of estimated NDVIs and NDVIv can cause substantial errors in FVC estimations when a simple linear mixture model is used. To address this problem, this paper proposes a physically based method. The leaf area index (LAI) and directional NDVI are introduced in a gap fraction model and a linear mixture model for FVC estimation to calculate NDVIv and NDVIs. The model incorporates the Moderate Resolution Imaging Spectroradiometer (MODIS) Bidirectional Reflectance Distribution Function (BRDF) model parameters product (MCD43B1) and LAI product, which are convenient to acquire. Two types of evaluation experiments are designed 1) with data simulated by a canopy radiative transfer model and 2) with satellite observations. The root-mean-square deviation (RMSD) for simulated data is less than 0.117, depending on the type of noise added on the data. In the real data experiment, the RMSD for cropland is 0.127, for grassland is 0.075, and for forest is 0.107. The experimental areas respectively lack fully vegetated and non-vegetated pixels at 1 km resolution. Consequently, a relatively large uncertainty is found while using the statistical methods and the RMSD ranges from 0.110 to 0.363 based on the real data. The proposed method is convenient to produce NDVIv and NDVIs maps for FVC estimation on regional and global scales.

  2. An empirical Bayes approach to analyzing recurring animal surveys

    USGS Publications Warehouse

    Johnson, D.H.

    1989-01-01

    Recurring estimates of the size of animal populations are often required by biologists or wildlife managers. Because of cost or other constraints, estimates frequently lack the accuracy desired but cannot readily be improved by additional sampling. This report proposes a statistical method employing empirical Bayes (EB) estimators as alternatives to those customarily used to estimate population size, and evaluates them by a subsampling experiment on waterfowl surveys. EB estimates, especially a simple limited-translation version, were more accurate and provided shorter confidence intervals with greater coverage probabilities than customary estimates.

  3. An HMM-based algorithm for evaluating rates of receptor–ligand binding kinetics from thermal fluctuation data

    PubMed Central

    Ju, Lining; Wang, Yijie Dylan; Hung, Ying; Wu, Chien-Fu Jeff; Zhu, Cheng

    2013-01-01

    Motivation: Abrupt reduction/resumption of thermal fluctuations of a force probe has been used to identify association/dissociation events of protein–ligand bonds. We show that off-rate of molecular dissociation can be estimated by the analysis of the bond lifetime, while the on-rate of molecular association can be estimated by the analysis of the waiting time between two neighboring bond events. However, the analysis relies heavily on subjective judgments and is time-consuming. To automate the process of mapping out bond events from thermal fluctuation data, we develop a hidden Markov model (HMM)-based method. Results: The HMM method represents the bond state by a hidden variable with two values: bound and unbound. The bond association/dissociation is visualized and pinpointed. We apply the method to analyze a key receptor–ligand interaction in the early stage of hemostasis and thrombosis: the von Willebrand factor (VWF) binding to platelet glycoprotein Ibα (GPIbα). The numbers of bond lifetime and waiting time events estimated by the HMM are much more than those estimated by a descriptive statistical method from the same set of raw data. The kinetic parameters estimated by the HMM are in excellent agreement with those by a descriptive statistical analysis, but have much smaller errors for both wild-type and two mutant VWF-A1 domains. Thus, the computerized analysis allows us to speed up the analysis and improve the quality of estimates of receptor–ligand binding kinetics. Contact: jeffwu@isye.gatech.edu or cheng.zhu@bme.gatech.edu PMID:23599504

  4. Linear models: permutation methods

    USGS Publications Warehouse

    Cade, B.S.; Everitt, B.S.; Howell, D.C.

    2005-01-01

    Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...

  5. Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

    PubMed

    Xiao, Yongling; Abrahamowicz, Michal

    2010-03-30

    We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

  6. Load estimator (LOADEST): a FORTRAN program for estimating constituent loads in streams and rivers

    USGS Publications Warehouse

    Runkel, Robert L.; Crawford, Charles G.; Cohn, Timothy A.

    2004-01-01

    LOAD ESTimator (LOADEST) is a FORTRAN program for estimating constituent loads in streams and rivers. Given a time series of streamflow, additional data variables, and constituent concentration, LOADEST assists the user in developing a regression model for the estimation of constituent load (calibration). Explanatory variables within the regression model include various functions of streamflow, decimal time, and additional user-specified data variables. The formulated regression model then is used to estimate loads over a user-specified time interval (estimation). Mean load estimates, standard errors, and 95 percent confidence intervals are developed on a monthly and(or) seasonal basis. The calibration and estimation procedures within LOADEST are based on three statistical estimation methods. The first two methods, Adjusted Maximum Likelihood Estimation (AMLE) and Maximum Likelihood Estimation (MLE), are appropriate when the calibration model errors (residuals) are normally distributed. Of the two, AMLE is the method of choice when the calibration data set (time series of streamflow, additional data variables, and concentration) contains censored data. The third method, Least Absolute Deviation (LAD), is an alternative to maximum likelihood estimation when the residuals are not normally distributed. LOADEST output includes diagnostic tests and warnings to assist the user in determining the appropriate estimation method and in interpreting the estimated loads. This report describes the development and application of LOADEST. Sections of the report describe estimation theory, input/output specifications, sample applications, and installation instructions.

  7. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  8. Remote sensing estimation of the total phosphorus concentration in a large lake using band combinations and regional multivariate statistical modeling techniques.

    PubMed

    Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi

    2015-03-15

    Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate statistical modeling techniques, demonstrated advantages for estimating the TP concentration in a large lake and had a strong potential for universal application for the TP concentration estimation in large lake waters worldwide. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Unbiased estimators for spatial distribution functions of classical fluids

    NASA Astrophysics Data System (ADS)

    Adib, Artur B.; Jarzynski, Christopher

    2005-01-01

    We use a statistical-mechanical identity closely related to the familiar virial theorem, to derive unbiased estimators for spatial distribution functions of classical fluids. In particular, we obtain estimators for both the fluid density ρ(r) in the vicinity of a fixed solute and the pair correlation g(r) of a homogeneous classical fluid. We illustrate the utility of our estimators with numerical examples, which reveal advantages over traditional histogram-based methods of computing such distributions.

  10. Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors

    NASA Astrophysics Data System (ADS)

    Pernot, Pascal; Savin, Andreas

    2018-06-01

    Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.

  11. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition.

    PubMed

    Falgreen, Steffen; Laursen, Maria Bach; Bødker, Julie Støve; Kjeldsen, Malene Krag; Schmitz, Alexander; Nyegaard, Mette; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2014-06-05

    In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves' dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Time independent summary statistics may aid the understanding of drugs' action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies.

  12. Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition

    PubMed Central

    2014-01-01

    Background In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves’ dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. Results First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Conclusion Time independent summary statistics may aid the understanding of drugs’ action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies. PMID:24902483

  13. Reliable clarity automatic-evaluation method for optical remote sensing images

    NASA Astrophysics Data System (ADS)

    Qin, Bangyong; Shang, Ren; Li, Shengyang; Hei, Baoqin; Liu, Zhiwen

    2015-10-01

    Image clarity, which reflects the sharpness degree at the edge of objects in images, is an important quality evaluate index for optical remote sensing images. Scholars at home and abroad have done a lot of work on estimation of image clarity. At present, common clarity-estimation methods for digital images mainly include frequency-domain function methods, statistical parametric methods, gradient function methods and edge acutance methods. Frequency-domain function method is an accurate clarity-measure approach. However, its calculation process is complicate and cannot be carried out automatically. Statistical parametric methods and gradient function methods are both sensitive to clarity of images, while their results are easy to be affected by the complex degree of images. Edge acutance method is an effective approach for clarity estimate, while it needs picking out the edges manually. Due to the limits in accuracy, consistent or automation, these existing methods are not applicable to quality evaluation of optical remote sensing images. In this article, a new clarity-evaluation method, which is based on the principle of edge acutance algorithm, is proposed. In the new method, edge detection algorithm and gradient search algorithm are adopted to automatically search the object edges in images. Moreover, The calculation algorithm for edge sharpness has been improved. The new method has been tested with several groups of optical remote sensing images. Compared with the existing automatic evaluation methods, the new method perform better both in accuracy and consistency. Thus, the new method is an effective clarity evaluation method for optical remote sensing images.

  14. A variation of the Davis-Smith method for in-flight determination of spacecraft magnetic fields.

    NASA Technical Reports Server (NTRS)

    Belcher, J. W.

    1973-01-01

    A variation of a procedure developed by Davis and Smith (1968) is presented for the in-flight determination of spacecraft magnetic fields. Both methods take statistical advantage of the observation that fluctuations in the interplanetary magnetic field over short periods of time are primarily changes in direction rather than in magnitude. During typical solar wind conditions between 0.8 and 1.0 AU, a statistical analysis of 2-3 days of continuous interplanetary field measurements yields an estimate of a constant spacecraft field with an uncertainty of plus or minus 0.25 gamma in the direction radial to the sun and plus or minus 15 gammas in the directions transverse to the radial. The method is also of use in estimating variable spacecraft fields with gradients of the order of 0.1 gamma/day and less and in other special circumstances.

  15. Statistical techniques for sampling and monitoring natural resources

    Treesearch

    Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado

    2004-01-01

    We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....

  16. Length and Rate of Individual Participation in Various Activities on Recreation Sites and Areas

    Treesearch

    Gary L. Tyre; George A. James

    1971-01-01

    While statistically reliable methods exist for estimating recreation use on large areas, they often prove prohibitively expensive. Inexpensive alternatives involving the length and rate of individual participation in specific activites are presented, together with data and statistics on the recreational use of three large areas on the National Forests. This...

  17. Practical Bias Correction in Aerial Surveys of Large Mammals: Validation of Hybrid Double-Observer with Sightability Method against Known Abundance of Feral Horse (Equus caballus) Populations

    PubMed Central

    2016-01-01

    Reliably estimating wildlife abundance is fundamental to effective management. Aerial surveys are one of the only spatially robust tools for estimating large mammal populations, but statistical sampling methods are required to address detection biases that affect accuracy and precision of the estimates. Although various methods for correcting aerial survey bias are employed on large mammal species around the world, these have rarely been rigorously validated. Several populations of feral horses (Equus caballus) in the western United States have been intensively studied, resulting in identification of all unique individuals. This provided a rare opportunity to test aerial survey bias correction on populations of known abundance. We hypothesized that a hybrid method combining simultaneous double-observer and sightability bias correction techniques would accurately estimate abundance. We validated this integrated technique on populations of known size and also on a pair of surveys before and after a known number was removed. Our analysis identified several covariates across the surveys that explained and corrected biases in the estimates. All six tests on known populations produced estimates with deviations from the known value ranging from -8.5% to +13.7% and <0.7 standard errors. Precision varied widely, from 6.1% CV to 25.0% CV. In contrast, the pair of surveys conducted around a known management removal produced an estimated change in population between the surveys that was significantly larger than the known reduction. Although the deviation between was only 9.1%, the precision estimate (CV = 1.6%) may have been artificially low. It was apparent that use of a helicopter in those surveys perturbed the horses, introducing detection error and heterogeneity in a manner that could not be corrected by our statistical models. Our results validate the hybrid method, highlight its potentially broad applicability, identify some limitations, and provide insight and guidance for improving survey designs. PMID:27139732

  18. Practical Bias Correction in Aerial Surveys of Large Mammals: Validation of Hybrid Double-Observer with Sightability Method against Known Abundance of Feral Horse (Equus caballus) Populations.

    PubMed

    Lubow, Bruce C; Ransom, Jason I

    2016-01-01

    Reliably estimating wildlife abundance is fundamental to effective management. Aerial surveys are one of the only spatially robust tools for estimating large mammal populations, but statistical sampling methods are required to address detection biases that affect accuracy and precision of the estimates. Although various methods for correcting aerial survey bias are employed on large mammal species around the world, these have rarely been rigorously validated. Several populations of feral horses (Equus caballus) in the western United States have been intensively studied, resulting in identification of all unique individuals. This provided a rare opportunity to test aerial survey bias correction on populations of known abundance. We hypothesized that a hybrid method combining simultaneous double-observer and sightability bias correction techniques would accurately estimate abundance. We validated this integrated technique on populations of known size and also on a pair of surveys before and after a known number was removed. Our analysis identified several covariates across the surveys that explained and corrected biases in the estimates. All six tests on known populations produced estimates with deviations from the known value ranging from -8.5% to +13.7% and <0.7 standard errors. Precision varied widely, from 6.1% CV to 25.0% CV. In contrast, the pair of surveys conducted around a known management removal produced an estimated change in population between the surveys that was significantly larger than the known reduction. Although the deviation between was only 9.1%, the precision estimate (CV = 1.6%) may have been artificially low. It was apparent that use of a helicopter in those surveys perturbed the horses, introducing detection error and heterogeneity in a manner that could not be corrected by our statistical models. Our results validate the hybrid method, highlight its potentially broad applicability, identify some limitations, and provide insight and guidance for improving survey designs.

  19. Quantitative Susceptibility Mapping by Inversion of a Perturbation Field Model: Correlation with Brain Iron in Normal Aging

    PubMed Central

    Poynton, Clare; Jenkinson, Mark; Adalsteinsson, Elfar; Sullivan, Edith V.; Pfefferbaum, Adolf; Wells, William

    2015-01-01

    There is increasing evidence that iron deposition occurs in specific regions of the brain in normal aging and neurodegenerative disorders such as Parkinson's, Huntington's, and Alzheimer's disease. Iron deposition changes the magnetic susceptibility of tissue, which alters the MR signal phase, and allows estimation of susceptibility differences using quantitative susceptibility mapping (QSM). We present a method for quantifying susceptibility by inversion of a perturbation model, or ‘QSIP’. The perturbation model relates phase to susceptibility using a kernel calculated in the spatial domain, in contrast to previous Fourier-based techniques. A tissue/air susceptibility atlas is used to estimate B0 inhomogeneity. QSIP estimates in young and elderly subjects are compared to postmortem iron estimates, maps of the Field-Dependent Relaxation Rate Increase (FDRI), and the L1-QSM method. Results for both groups showed excellent agreement with published postmortem data and in-vivo FDRI: statistically significant Spearman correlations ranging from Rho = 0.905 to Rho = 1.00 were obtained. QSIP also showed improvement over FDRI and L1-QSM: reduced variance in susceptibility estimates and statistically significant group differences were detected in striatal and brainstem nuclei, consistent with age-dependent iron accumulation in these regions. PMID:25248179

  20. Confidence intervals for expected moments algorithm flood quantile estimates

    USGS Publications Warehouse

    Cohn, Timothy A.; Lane, William L.; Stedinger, Jery R.

    2001-01-01

    Historical and paleoflood information can substantially improve flood frequency estimates if appropriate statistical procedures are properly applied. However, the Federal guidelines for flood frequency analysis, set forth in Bulletin 17B, rely on an inefficient “weighting” procedure that fails to take advantage of historical and paleoflood information. This has led researchers to propose several more efficient alternatives including the Expected Moments Algorithm (EMA), which is attractive because it retains Bulletin 17B's statistical structure (method of moments with the Log Pearson Type 3 distribution) and thus can be easily integrated into flood analyses employing the rest of the Bulletin 17B approach. The practical utility of EMA, however, has been limited because no closed‐form method has been available for quantifying the uncertainty of EMA‐based flood quantile estimates. This paper addresses that concern by providing analytical expressions for the asymptotic variance of EMA flood‐quantile estimators and confidence intervals for flood quantile estimates. Monte Carlo simulations demonstrate the properties of such confidence intervals for sites where a 25‐ to 100‐year streamgage record is augmented by 50 to 150 years of historical information. The experiments show that the confidence intervals, though not exact, should be acceptable for most purposes.

Top