robust variance estimates: Topics by Science.gov

Sample records for robust variance estimates

Robust variance estimation with dependent effect sizes: practical considerations including a software tutorial in Stata and spss.

PubMed

Tanner-Smith, Emily E; Tipton, Elizabeth

2014-03-01

Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and spss (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding the practical application and implementation of those macros. This paper provides a brief tutorial on the implementation of the Stata and spss macros and discusses practical issues meta-analysts should consider when estimating meta-regression models with robust variance estimates. Two example databases are used in the tutorial to illustrate the use of meta-analysis with robust variance estimates. Copyright © 2013 John Wiley & Sons, Ltd.
A note on variance estimation in random effects meta-regression.

PubMed

Sidik, Kurex; Jonkman, Jeffrey N

2005-01-01

For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
Robust Variance Estimation with Dependent Effect Sizes: Practical Considerations Including a Software Tutorial in Stata and SPSS

ERIC Educational Resources Information Center

Tanner-Smith, Emily E.; Tipton, Elizabeth

2014-01-01

Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and SPSS (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding…
Robust versus consistent variance estimators in marginal structural Cox models.

PubMed

Enders, Dirk; Engel, Susanne; Linder, Roland; Pigeot, Iris

2018-06-11

In survival analyses, inverse-probability-of-treatment (IPT) and inverse-probability-of-censoring (IPC) weighted estimators of parameters in marginal structural Cox models are often used to estimate treatment effects in the presence of time-dependent confounding and censoring. In most applications, a robust variance estimator of the IPT and IPC weighted estimator is calculated leading to conservative confidence intervals. This estimator assumes that the weights are known rather than estimated from the data. Although a consistent estimator of the asymptotic variance of the IPT and IPC weighted estimator is generally available, applications and thus information on the performance of the consistent estimator are lacking. Reasons might be a cumbersome implementation in statistical software, which is further complicated by missing details on the variance formula. In this paper, we therefore provide a detailed derivation of the variance of the asymptotic distribution of the IPT and IPC weighted estimator and explicitly state the necessary terms to calculate a consistent estimator of this variance. We compare the performance of the robust and consistent variance estimators in an application based on routine health care data and in a simulation study. The simulation reveals no substantial differences between the 2 estimators in medium and large data sets with no unmeasured confounding, but the consistent variance estimator performs poorly in small samples or under unmeasured confounding, if the number of confounders is large. We thus conclude that the robust estimator is more appropriate for all practical purposes. Copyright © 2018 John Wiley & Sons, Ltd.
Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis

ERIC Educational Resources Information Center

Williams, Ryan

2013-01-01

The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…
An improved method for bivariate meta-analysis when within-study correlations are unknown.

PubMed

Hong, Chuan; D Riley, Richard; Chen, Yong

2018-03-01

Multivariate meta-analysis, which jointly analyzes multiple and possibly correlated outcomes in a single analysis, is becoming increasingly popular in recent years. An attractive feature of the multivariate meta-analysis is its ability to account for the dependence between multiple estimates from the same study. However, standard inference procedures for multivariate meta-analysis require the knowledge of within-study correlations, which are usually unavailable. This limits standard inference approaches in practice. Riley et al proposed a working model and an overall synthesis correlation parameter to account for the marginal correlation between outcomes, where the only data needed are those required for a separate univariate random-effects meta-analysis. As within-study correlations are not required, the Riley method is applicable to a wide variety of evidence synthesis situations. However, the standard variance estimator of the Riley method is not entirely correct under many important settings. As a consequence, the coverage of a function of pooled estimates may not reach the nominal level even when the number of studies in the multivariate meta-analysis is large. In this paper, we improve the Riley method by proposing a robust variance estimator, which is asymptotically correct even when the model is misspecified (ie, when the likelihood function is incorrect). Simulation studies of a bivariate meta-analysis, in a variety of settings, show a function of pooled estimates has improved performance when using the proposed robust variance estimator. In terms of individual pooled estimates themselves, the standard variance estimator and robust variance estimator give similar results to the original method, with appropriate coverage. The proposed robust variance estimator performs well when the number of studies is relatively large. Therefore, we recommend the use of the robust method for meta-analyses with a relatively large number of studies (eg, m≥50). When the sample size is relatively small, we recommend the use of the robust method under the working independence assumption. We illustrate the proposed method through 2 meta-analyses. Copyright © 2017 John Wiley & Sons, Ltd.
Survival estimation and the effects of dependency among animals

USGS Publications Warehouse

Schmutz, Joel A.; Ward, David H.; Sedinger, James S.; Rexstad, Eric A.

1995-01-01

Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant, we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities.
Variance-Stable R-Estimators.

DTIC Science & Technology

1984-05-01

By means of the concept of change-of variance function we investigate the stability properties of the asymptotic variance of R-estimators. This allows us to construct the optimal V-robust R-estimator that minimizes the asymptotic variance at the model, under the side condition of a bounded change-of variance function. Finally, we discuss the connection between this function and an influence function for two-sample rank tests introduced by Eplett (1980). (Author)
Estimating integrated variance in the presence of microstructure noise using linear regression

NASA Astrophysics Data System (ADS)

Holý, Vladimír

2017-07-01

Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.
Bias and robustness of uncertainty components estimates in transient climate projections

NASA Astrophysics Data System (ADS)

Hingray, Benoit; Blanchet, Juliette; Jean-Philippe, Vidal

2016-04-01

A critical issue in climate change studies is the estimation of uncertainties in projections along with the contribution of the different uncertainty sources, including scenario uncertainty, the different components of model uncertainty and internal variability. Quantifying the different uncertainty sources faces actually different problems. For instance and for the sake of simplicity, an estimate of model uncertainty is classically obtained from the empirical variance of the climate responses obtained for the different modeling chains. These estimates are however biased. Another difficulty arises from the limited number of members that are classically available for most modeling chains. In this case, the climate response of one given chain and the effect of its internal variability may be actually difficult if not impossible to separate. The estimate of scenario uncertainty, model uncertainty and internal variability components are thus likely to be not really robust. We explore the importance of the bias and the robustness of the estimates for two classical Analysis of Variance (ANOVA) approaches: a Single Time approach (STANOVA), based on the only data available for the considered projection lead time and a time series based approach (QEANOVA), which assumes quasi-ergodicity of climate outputs over the whole available climate simulation period (Hingray and Saïd, 2014). We explore both issues for a simple but classical configuration where uncertainties in projections are composed of two single sources: model uncertainty and internal climate variability. The bias in model uncertainty estimates is explored from theoretical expressions of unbiased estimators developed for both ANOVA approaches. The robustness of uncertainty estimates is explored for multiple synthetic ensembles of time series projections generated with MonteCarlo simulations. For both ANOVA approaches, when the empirical variance of climate responses is used to estimate model uncertainty, the bias is always positive. It can be especially high with STANOVA. In the most critical configurations, when the number of members available for each modeling chain is small (< 3) and when internal variability explains most of total uncertainty variance (75% or more), the overestimation is higher than 100% of the true model uncertainty variance. The bias can be considerably reduced with a time series ANOVA approach, owing to the multiple time steps accounted for. The longer the transient time period used for the analysis, the larger the reduction. When a quasi-ergodic ANOVA approach is applied to decadal data for the whole 1980-2100 period, the bias is reduced by a factor 2.5 to 20 depending on the projection lead time. In all cases, the bias is likely to be not negligible for a large number of climate impact studies resulting in a likely large overestimation of the contribution of model uncertainty to total variance. For both approaches, the robustness of all uncertainty estimates is higher when more members are available, when internal variability is smaller and/or the response-to-uncertainty ratio is higher. QEANOVA estimates are much more robust than STANOVA ones: QEANOVA simulated confidence intervals are roughly 3 to 5 times smaller than STANOVA ones. Excepted for STANOVA when less than 3 members is available, the robustness is rather high for total uncertainty and moderate for internal variability estimates. For model uncertainty or response-to-uncertainty ratio estimates, the robustness is conversely low for QEANOVA to very low for STANOVA. In the most critical configurations (small number of member, large internal variability), large over- or underestimation of uncertainty components is very thus likely. To propose relevant uncertainty analyses and avoid misleading interpretations, estimates of uncertainty components should be therefore bias corrected and ideally come with estimates of their robustness. This work is part of the COMPLEX Project (European Collaborative Project FP7-ENV-2012 number: 308601; http://www.complex.ac.uk/). Hingray, B., Saïd, M., 2014. Partitioning internal variability and model uncertainty components in a multimodel multireplicate ensemble of climate projections. J.Climate. doi:10.1175/JCLI-D-13-00629.1 Hingray, B., Blanchet, J. (revision) Unbiased estimators for uncertainty components in transient climate projections. J. Climate Hingray, B., Blanchet, J., Vidal, J.P. (revision) Robustness of uncertainty components estimates in climate projections. J.Climate
Mixed model approaches for diallel analysis based on a bio-model.

PubMed

Zhu, J; Weir, B S

1996-12-01

A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.
Pixel-level multisensor image fusion based on matrix completion and robust principal component analysis

NASA Astrophysics Data System (ADS)

Wang, Zhuozheng; Deller, J. R.; Fleet, Blair D.

2016-01-01

Acquired digital images are often corrupted by a lack of camera focus, faulty illumination, or missing data. An algorithm is presented for fusion of multiple corrupted images of a scene using the lifting wavelet transform. The method employs adaptive fusion arithmetic based on matrix completion and self-adaptive regional variance estimation. Characteristics of the wavelet coefficients are used to adaptively select fusion rules. Robust principal component analysis is applied to low-frequency image components, and regional variance estimation is applied to high-frequency components. Experiments reveal that the method is effective for multifocus, visible-light, and infrared image fusion. Compared with traditional algorithms, the new algorithm not only increases the amount of preserved information and clarity but also improves robustness.
Robust guaranteed-cost adaptive quantum phase estimation

NASA Astrophysics Data System (ADS)

Roy, Shibdas; Berry, Dominic W.; Petersen, Ian R.; Huntington, Elanor H.

2017-05-01

Quantum parameter estimation plays a key role in many fields like quantum computation, communication, and metrology. Optimal estimation allows one to achieve the most precise parameter estimates, but requires accurate knowledge of the model. Any inevitable uncertainty in the model parameters may heavily degrade the quality of the estimate. It is therefore desired to make the estimation process robust to such uncertainties. Robust estimation was previously studied for a varying phase, where the goal was to estimate the phase at some time in the past, using the measurement results from both before and after that time within a fixed time interval up to current time. Here, we consider a robust guaranteed-cost filter yielding robust estimates of a varying phase in real time, where the current phase is estimated using only past measurements. Our filter minimizes the largest (worst-case) variance in the allowable range of the uncertain model parameter(s) and this determines its guaranteed cost. It outperforms in the worst case the optimal Kalman filter designed for the model with no uncertainty, which corresponds to the center of the possible range of the uncertain parameter(s). Moreover, unlike the Kalman filter, our filter in the worst case always performs better than the best achievable variance for heterodyne measurements, which we consider as the tolerable threshold for our system. Furthermore, we consider effective quantum efficiency and effective noise power, and show that our filter provides the best results by these measures in the worst case.
Robust geostatistical analysis of spatial data

NASA Astrophysics Data System (ADS)

Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.

2013-04-01

Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Nonparametric estimation of plant density by the distance method

USGS Publications Warehouse

Patil, S.A.; Burnham, K.P.; Kovner, J.L.

1979-01-01

A relation between the plant density and the probability density function of the nearest neighbor distance (squared) from a random point is established under fairly broad conditions. Based upon this relationship, a nonparametric estimator for the plant density is developed and presented in terms of order statistics. Consistency and asymptotic normality of the estimator are discussed. An interval estimator for the density is obtained. The modifications of this estimator and its variance are given when the distribution is truncated. Simulation results are presented for regular, random and aggregated populations to illustrate the nonparametric estimator and its variance. A numerical example from field data is given. Merits and deficiencies of the estimator are discussed with regard to its robustness and variance.
On the Computation of the RMSEA and CFI from the Mean-And-Variance Corrected Test Statistic with Nonnormal Data in SEM.

PubMed

Savalei, Victoria

2018-01-01

A new type of nonnormality correction to the RMSEA has recently been developed, which has several advantages over existing corrections. In particular, the new correction adjusts the sample estimate of the RMSEA for the inflation due to nonnormality, while leaving its population value unchanged, so that established cutoff criteria can still be used to judge the degree of approximate fit. A confidence interval (CI) for the new robust RMSEA based on the mean-corrected ("Satorra-Bentler") test statistic has also been proposed. Follow up work has provided the same type of nonnormality correction for the CFI (Brosseau-Liard & Savalei, 2014). These developments have recently been implemented in lavaan. This note has three goals: a) to show how to compute the new robust RMSEA and CFI from the mean-and-variance corrected test statistic; b) to offer a new CI for the robust RMSEA based on the mean-and-variance corrected test statistic; and c) to caution that the logic of the new nonnormality corrections to RMSEA and CFI is most appropriate for the maximum likelihood (ML) estimator, and cannot easily be generalized to the most commonly used categorical data estimators.
Robust Portfolio Optimization Using Pseudodistances.

PubMed

Toma, Aida; Leoni-Aubin, Samuela

2015-01-01

The presence of outliers in financial asset returns is a frequently occurring phenomenon which may lead to unreliable mean-variance optimized portfolios. This fact is due to the unbounded influence that outliers can have on the mean returns and covariance estimators that are inputs in the optimization procedure. In this paper we present robust estimators of mean and covariance matrix obtained by minimizing an empirical version of a pseudodistance between the assumed model and the true model underlying the data. We prove and discuss theoretical properties of these estimators, such as affine equivariance, B-robustness, asymptotic normality and asymptotic relative efficiency. These estimators can be easily used in place of the classical estimators, thereby providing robust optimized portfolios. A Monte Carlo simulation study and applications to real data show the advantages of the proposed approach. We study both in-sample and out-of-sample performance of the proposed robust portfolios comparing them with some other portfolios known in literature.
Robust Portfolio Optimization Using Pseudodistances

PubMed Central

2015-01-01

The presence of outliers in financial asset returns is a frequently occurring phenomenon which may lead to unreliable mean-variance optimized portfolios. This fact is due to the unbounded influence that outliers can have on the mean returns and covariance estimators that are inputs in the optimization procedure. In this paper we present robust estimators of mean and covariance matrix obtained by minimizing an empirical version of a pseudodistance between the assumed model and the true model underlying the data. We prove and discuss theoretical properties of these estimators, such as affine equivariance, B-robustness, asymptotic normality and asymptotic relative efficiency. These estimators can be easily used in place of the classical estimators, thereby providing robust optimized portfolios. A Monte Carlo simulation study and applications to real data show the advantages of the proposed approach. We study both in-sample and out-of-sample performance of the proposed robust portfolios comparing them with some other portfolios known in literature. PMID:26468948
Robust geostatistical analysis of spatial data

NASA Astrophysics Data System (ADS)

Papritz, A.; Künsch, H. R.; Schwierz, C.; Stahel, W. A.

2012-04-01

Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outlying observations may results from errors (e.g. in data transcription) or from local perturbations in the processes that are responsible for a given pattern of spatial variation. As an example, the spatial distribution of some trace metal in the soils of a region may be distorted by emissions of local anthropogenic sources. Outliers affect the modelling of the large-scale spatial variation, the so-called external drift or trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) [2] proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) [1] for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation. Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and unsampled locations and kriging variances. The method has been implemented in an R package. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis of the Tarrawarra soil moisture data set [3].
Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis.

PubMed

Austin, Peter C

2016-12-30

Propensity score methods are used to reduce the effects of observed confounding when using observational data to estimate the effects of treatments or exposures. A popular method of using the propensity score is inverse probability of treatment weighting (IPTW). When using this method, a weight is calculated for each subject that is equal to the inverse of the probability of receiving the treatment that was actually received. These weights are then incorporated into the analyses to minimize the effects of observed confounding. Previous research has found that these methods result in unbiased estimation when estimating the effect of treatment on survival outcomes. However, conventional methods of variance estimation were shown to result in biased estimates of standard error. In this study, we conducted an extensive set of Monte Carlo simulations to examine different methods of variance estimation when using a weighted Cox proportional hazards model to estimate the effect of treatment. We considered three variance estimation methods: (i) a naïve model-based variance estimator; (ii) a robust sandwich-type variance estimator; and (iii) a bootstrap variance estimator. We considered estimation of both the average treatment effect and the average treatment effect in the treated. We found that the use of a bootstrap estimator resulted in approximately correct estimates of standard errors and confidence intervals with the correct coverage rates. The other estimators resulted in biased estimates of standard errors and confidence intervals with incorrect coverage rates. Our simulations were informed by a case study examining the effect of statin prescribing on mortality. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Influence function based variance estimation and missing data issues in case-cohort studies.

PubMed

Mark, S D; Katki, H

2001-12-01

Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.
A Generally Robust Approach for Testing Hypotheses and Setting Confidence Intervals for Effect Sizes

ERIC Educational Resources Information Center

Keselman, H. J.; Algina, James; Lix, Lisa M.; Wilcox, Rand R.; Deering, Kathleen N.

2008-01-01

Standard least squares analysis of variance methods suffer from poor power under arbitrarily small departures from normality and fail to control the probability of a Type I error when standard assumptions are violated. This article describes a framework for robust estimation and testing that uses trimmed means with an approximate degrees of…
Estimation of bias and variance of measurements made from tomography scans

NASA Astrophysics Data System (ADS)

Bradley, Robert S.

2016-09-01

Tomographic imaging modalities are being increasingly used to quantify internal characteristics of objects for a wide range of applications, from medical imaging to materials science research. However, such measurements are typically presented without an assessment being made of their associated variance or confidence interval. In particular, noise in raw scan data places a fundamental lower limit on the variance and bias of measurements made on the reconstructed 3D volumes. In this paper, the simulation-extrapolation technique, which was originally developed for statistical regression, is adapted to estimate the bias and variance for measurements made from a single scan. The application to x-ray tomography is considered in detail and it is demonstrated that the technique can also allow the robustness of automatic segmentation strategies to be compared.
On the robustness of a Bayes estimate. [in reliability theory

NASA Technical Reports Server (NTRS)

Canavos, G. C.

1974-01-01

This paper examines the robustness of a Bayes estimator with respect to the assigned prior distribution. A Bayesian analysis for a stochastic scale parameter of a Weibull failure model is summarized in which the natural conjugate is assigned as the prior distribution of the random parameter. The sensitivity analysis is carried out by the Monte Carlo method in which, although an inverted gamma is the assigned prior, realizations are generated using distribution functions of varying shape. For several distributional forms and even for some fixed values of the parameter, simulated mean squared errors of Bayes and minimum variance unbiased estimators are determined and compared. Results indicate that the Bayes estimator remains squared-error superior and appears to be largely robust to the form of the assigned prior distribution.
On the validity of within-nuclear-family genetic association analysis in samples of extended families.

PubMed

Bureau, Alexandre; Duchesne, Thierry

2015-12-01

Splitting extended families into their component nuclear families to apply a genetic association method designed for nuclear families is a widespread practice in familial genetic studies. Dependence among genotypes and phenotypes of nuclear families from the same extended family arises because of genetic linkage of the tested marker with a risk variant or because of familial specificity of genetic effects due to gene-environment interaction. This raises concerns about the validity of inference conducted under the assumption of independence of the nuclear families. We indeed prove theoretically that, in a conditional logistic regression analysis applicable to disease cases and their genotyped parents, the naive model-based estimator of the variance of the coefficient estimates underestimates the true variance. However, simulations with realistic effect sizes of risk variants and variation of this effect from family to family reveal that the underestimation is negligible. The simulations also show the greater efficiency of the model-based variance estimator compared to a robust empirical estimator. Our recommendation is therefore, to use the model-based estimator of variance for inference on effects of genetic variants.
Estimation of the mixing layer height over a high altitude site in Central Himalayan region by using Doppler lidar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shukla, K. K.; Phanikumar, D. V.; Newsom, Rob K.

2014-03-01

A Doppler lidar was installed at Manora Peak, Nainital (29.4 N; 79.2 E, 1958 amsl) to estimate mixing layer height for the first time by using vertical velocity variance as basic measurement parameter for the period September-November 2011. Mixing layer height is found to be located ~0.57 +/- 0.1and 0.45 +/- 0.05km AGL during day and nighttime, respectively. The estimation of mixing layer height shows good correlation (R>0.8) between different instruments and with different methods. Our results show that wavelet co-variance transform is a robust method for mixing layer height estimation.
Effect of non-normality on test statistics for one-way independent groups designs.

PubMed

Cribbie, Robert A; Fiksenbaum, Lisa; Keselman, H J; Wilcox, Rand R

2012-02-01

The data obtained from one-way independent groups designs is typically non-normal in form and rarely equally variable across treatment populations (i.e., population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e., the analysis of variance F test) typically provides invalid results (e.g., too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non-normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e., trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non-normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non-normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non-normal. © 2011 The British Psychological Society.
Working covariance model selection for generalized estimating equations.

PubMed

Carey, Vincent J; Wang, You-Gan

2011-11-20

We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice. Copyright © 2011 John Wiley & Sons, Ltd.
Inverse Optimization: A New Perspective on the Black-Litterman Model.

PubMed

Bertsimas, Dimitris; Gupta, Vishal; Paschalidis, Ioannis Ch

2012-12-11

The Black-Litterman (BL) model is a widely used asset allocation model in the financial industry. In this paper, we provide a new perspective. The key insight is to replace the statistical framework in the original approach with ideas from inverse optimization. This insight allows us to significantly expand the scope and applicability of the BL model. We provide a richer formulation that, unlike the original model, is flexible enough to incorporate investor information on volatility and market dynamics. Equally importantly, our approach allows us to move beyond the traditional mean-variance paradigm of the original model and construct "BL"-type estimators for more general notions of risk such as coherent risk measures. Computationally, we introduce and study two new "BL"-type estimators and their corresponding portfolios: a Mean Variance Inverse Optimization (MV-IO) portfolio and a Robust Mean Variance Inverse Optimization (RMV-IO) portfolio. These two approaches are motivated by ideas from arbitrage pricing theory and volatility uncertainty. Using numerical simulation and historical backtesting, we show that both methods often demonstrate a better risk-reward tradeoff than their BL counterparts and are more robust to incorrect investor views.
Comparison of Efficiency of Jackknife and Variance Component Estimators of Standard Errors. Program Statistics Research. Technical Report.

ERIC Educational Resources Information Center

Longford, Nicholas T.

Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…
Comparison of mode estimation methods and application in molecular clock analysis

NASA Technical Reports Server (NTRS)

Hedges, S. Blair; Shah, Prachi

2003-01-01

BACKGROUND: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study. RESULTS: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations. CONCLUSION: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.
Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

PubMed Central

Huang, Lei

2015-01-01

To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

NASA Astrophysics Data System (ADS)

Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

2014-09-01

In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Inverse Optimization: A New Perspective on the Black-Litterman Model

PubMed Central

Bertsimas, Dimitris; Gupta, Vishal; Paschalidis, Ioannis Ch.

2014-01-01

The Black-Litterman (BL) model is a widely used asset allocation model in the financial industry. In this paper, we provide a new perspective. The key insight is to replace the statistical framework in the original approach with ideas from inverse optimization. This insight allows us to significantly expand the scope and applicability of the BL model. We provide a richer formulation that, unlike the original model, is flexible enough to incorporate investor information on volatility and market dynamics. Equally importantly, our approach allows us to move beyond the traditional mean-variance paradigm of the original model and construct “BL”-type estimators for more general notions of risk such as coherent risk measures. Computationally, we introduce and study two new “BL”-type estimators and their corresponding portfolios: a Mean Variance Inverse Optimization (MV-IO) portfolio and a Robust Mean Variance Inverse Optimization (RMV-IO) portfolio. These two approaches are motivated by ideas from arbitrage pricing theory and volatility uncertainty. Using numerical simulation and historical backtesting, we show that both methods often demonstrate a better risk-reward tradeoff than their BL counterparts and are more robust to incorrect investor views. PMID:25382873
Multi-Sensor Optimal Data Fusion Based on the Adaptive Fading Unscented Kalman Filter

PubMed Central

Gao, Bingbing; Hu, Gaoge; Gao, Shesheng; Gu, Chengfan

2018-01-01

This paper presents a new optimal data fusion methodology based on the adaptive fading unscented Kalman filter for multi-sensor nonlinear stochastic systems. This methodology has a two-level fusion structure: at the bottom level, an adaptive fading unscented Kalman filter based on the Mahalanobis distance is developed and serves as local filters to improve the adaptability and robustness of local state estimations against process-modeling error; at the top level, an unscented transformation-based multi-sensor optimal data fusion for the case of N local filters is established according to the principle of linear minimum variance to calculate globally optimal state estimation by fusion of local estimations. The proposed methodology effectively refrains from the influence of process-modeling error on the fusion solution, leading to improved adaptability and robustness of data fusion for multi-sensor nonlinear stochastic systems. It also achieves globally optimal fusion results based on the principle of linear minimum variance. Simulation and experimental results demonstrate the efficacy of the proposed methodology for INS/GNSS/CNS (inertial navigation system/global navigation satellite system/celestial navigation system) integrated navigation. PMID:29415509
Multi-Sensor Optimal Data Fusion Based on the Adaptive Fading Unscented Kalman Filter.

PubMed

Gao, Bingbing; Hu, Gaoge; Gao, Shesheng; Zhong, Yongmin; Gu, Chengfan

2018-02-06

This paper presents a new optimal data fusion methodology based on the adaptive fading unscented Kalman filter for multi-sensor nonlinear stochastic systems. This methodology has a two-level fusion structure: at the bottom level, an adaptive fading unscented Kalman filter based on the Mahalanobis distance is developed and serves as local filters to improve the adaptability and robustness of local state estimations against process-modeling error; at the top level, an unscented transformation-based multi-sensor optimal data fusion for the case of N local filters is established according to the principle of linear minimum variance to calculate globally optimal state estimation by fusion of local estimations. The proposed methodology effectively refrains from the influence of process-modeling error on the fusion solution, leading to improved adaptability and robustness of data fusion for multi-sensor nonlinear stochastic systems. It also achieves globally optimal fusion results based on the principle of linear minimum variance. Simulation and experimental results demonstrate the efficacy of the proposed methodology for INS/GNSS/CNS (inertial navigation system/global navigation satellite system/celestial navigation system) integrated navigation.
A mixture model for robust registration in Kinect sensor

NASA Astrophysics Data System (ADS)

Peng, Li; Zhou, Huabing; Zhu, Shengguo

2018-03-01

The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low registration precision between color image and depth image. In this paper, we present a robust method to improve the registration precision by a mixture model that can handle multiply images with the nonparametric model. We impose non-parametric geometrical constraints on the correspondence, as a prior distribution, in a reproducing kernel Hilbert space (RKHS).The estimation is performed by the EM algorithm which by also estimating the variance of the prior model is able to obtain good estimates. We illustrate the proposed method on the public available dataset. The experimental results show that our approach outperforms the baseline methods.
Superresolution SAR Imaging Algorithm Based on Mvm and Weighted Norm Extrapolation

NASA Astrophysics Data System (ADS)

Zhang, P.; Chen, Q.; Li, Z.; Tang, Z.; Liu, J.; Zhao, L.

2013-08-01

In this paper, we present an extrapolation approach, which uses minimum weighted norm constraint and minimum variance spectrum estimation, for improving synthetic aperture radar (SAR) resolution. Minimum variance method is a robust high resolution method to estimate spectrum. Based on the theory of SAR imaging, the signal model of SAR imagery is analyzed to be feasible for using data extrapolation methods to improve the resolution of SAR image. The method is used to extrapolate the efficient bandwidth in phase history field and better results are obtained compared with adaptive weighted norm extrapolation (AWNE) method and traditional imaging method using simulated data and actual measured data.
A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?

PubMed Central

Mumford, Jeanette A.

2017-01-01

Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782
Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

PubMed

Xiao, Yongling; Abrahamowicz, Michal

2010-03-30

We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

NASA Astrophysics Data System (ADS)

Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

2016-10-01

Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.
Directional variance adjustment: bias reduction in covariance matrices based on factor analysis with an application to portfolio optimization.

PubMed

Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W; Müller, Klaus-Robert; Lemm, Steven

2013-01-01

Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation.
Directional Variance Adjustment: Bias Reduction in Covariance Matrices Based on Factor Analysis with an Application to Portfolio Optimization

PubMed Central

Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W.; Müller, Klaus-Robert; Lemm, Steven

2013-01-01

Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation. PMID:23844016
M-estimator for the 3D symmetric Helmert coordinate transformation

NASA Astrophysics Data System (ADS)

Chang, Guobin; Xu, Tianhe; Wang, Qianxin

2018-01-01

The M-estimator for the 3D symmetric Helmert coordinate transformation problem is developed. Small-angle rotation assumption is abandoned. The direction cosine matrix or the quaternion is used to represent the rotation. The 3 × 1 multiplicative error vector is defined to represent the rotation estimation error. An analytical solution can be employed to provide the initial approximate for iteration, if the outliers are not large. The iteration is carried out using the iterative reweighted least-squares scheme. In each iteration after the first one, the measurement equation is linearized using the available parameter estimates, the reweighting matrix is constructed using the residuals obtained in the previous iteration, and then the parameter estimates with their variance-covariance matrix are calculated. The influence functions of a single pseudo-measurement on the least-squares estimator and on the M-estimator are derived to theoretically show the robustness. In the solution process, the parameter is rescaled in order to improve the numerical stability. Monte Carlo experiments are conducted to check the developed method. Different cases to investigate whether the assumed stochastic model is correct are considered. The results with the simulated data slightly deviating from the true model are used to show the developed method's statistical efficacy at the assumed stochastic model, its robustness against the deviations from the assumed stochastic model, and the validity of the estimated variance-covariance matrix no matter whether the assumed stochastic model is correct or not.
Jackknife variance of the partial area under the empirical receiver operating characteristic curve.

PubMed

Bandos, Andriy I; Guo, Ben; Gur, David

2017-04-01

Receiver operating characteristic analysis provides an important methodology for assessing traditional (e.g., imaging technologies and clinical practices) and new (e.g., genomic studies, biomarker development) diagnostic problems. The area under the clinically/practically relevant part of the receiver operating characteristic curve (partial area or partial area under the receiver operating characteristic curve) is an important performance index summarizing diagnostic accuracy at multiple operating points (decision thresholds) that are relevant to actual clinical practice. A robust estimate of the partial area under the receiver operating characteristic curve is provided by the area under the corresponding part of the empirical receiver operating characteristic curve. We derive a closed-form expression for the jackknife variance of the partial area under the empirical receiver operating characteristic curve. Using the derived analytical expression, we investigate the differences between the jackknife variance and a conventional variance estimator. The relative properties in finite samples are demonstrated in a simulation study. The developed formula enables an easy way to estimate the variance of the empirical partial area under the receiver operating characteristic curve, thereby substantially reducing the computation burden, and provides important insight into the structure of the variability. We demonstrate that when compared with the conventional approach, the jackknife variance has substantially smaller bias, and leads to a more appropriate type I error rate of the Wald-type test. The use of the jackknife variance is illustrated in the analysis of a data set from a diagnostic imaging study.
ROBUST ESTIMATION OF MEAN AND VARIANCE USING ENVIRONMENTAL DATA SETS WITH BELOW DETECTION LIMIT OBSERVATIONS

EPA Science Inventory

Scientists, especially environmental scientists often encounter trace level concentrations that are typically reported as less than a certain limit of detection, L. Type 1, left-censored data arise when certain low values lying below L are ignored or unknown as they cannot be mea...
Robustness of survival estimates for radio-marked animals

USGS Publications Warehouse

Bunck, C.M.; Chen, C.-L.

1992-01-01

Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.
Robust transceiver design for reciprocal M × N interference channel based on statistical linearization approximation

NASA Astrophysics Data System (ADS)

Mayvan, Ali D.; Aghaeinia, Hassan; Kazemi, Mohammad

2017-12-01

This paper focuses on robust transceiver design for throughput enhancement on the interference channel (IC), under imperfect channel state information (CSI). In this paper, two algorithms are proposed to improve the throughput of the multi-input multi-output (MIMO) IC. Each transmitter and receiver has, respectively, M and N antennas and IC operates in a time division duplex mode. In the first proposed algorithm, each transceiver adjusts its filter to maximize the expected value of signal-to-interference-plus-noise ratio (SINR). On the other hand, the second algorithm tries to minimize the variances of the SINRs to hedge against the variability due to CSI error. Taylor expansion is exploited to approximate the effect of CSI imperfection on mean and variance. The proposed robust algorithms utilize the reciprocity of wireless networks to optimize the estimated statistical properties in two different working modes. Monte Carlo simulations are employed to investigate sum rate performance of the proposed algorithms and the advantage of incorporating variation minimization into the transceiver design.
Improved variance estimation of classification performance via reduction of bias caused by small sample size.

PubMed

Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders

2006-03-13

Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.
Gap-filling methods to impute eddy covariance flux data by preserving variance.

NASA Astrophysics Data System (ADS)

Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.

2015-12-01

To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables of the parameters traditionally used. Then, comparisons of the predicted values from these methods and 'traditional' gap-filling methods (using 12 fixed monthly windows) will be assessed to show the scale of preserving variance. Further, this method will be applied to impute artificially created gaps for analyzing if variance is preserved.
Robust linear discriminant analysis with distance based estimators

NASA Astrophysics Data System (ADS)

Lim, Yai-Fung; Yahaya, Sharipah Soaad Syed; Ali, Hazlina

2017-11-01

Linear discriminant analysis (LDA) is one of the supervised classification techniques concerning relationship between a categorical variable and a set of continuous variables. The main objective of LDA is to create a function to distinguish between populations and allocating future observations to previously defined populations. Under the assumptions of normality and homoscedasticity, the LDA yields optimal linear discriminant rule (LDR) between two or more groups. However, the optimality of LDA highly relies on the sample mean and pooled sample covariance matrix which are known to be sensitive to outliers. To alleviate these conflicts, a new robust LDA using distance based estimators known as minimum variance vector (MVV) has been proposed in this study. The MVV estimators were used to substitute the classical sample mean and classical sample covariance to form a robust linear discriminant rule (RLDR). Simulation and real data study were conducted to examine on the performance of the proposed RLDR measured in terms of misclassification error rates. The computational result showed that the proposed RLDR is better than the classical LDR and was comparable with the existing robust LDR.
Variance reduction through robust design of boundary conditions for stochastic hyperbolic systems of equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nordström, Jan, E-mail: jan.nordstrom@liu.se; Wahlsten, Markus, E-mail: markus.wahlsten@liu.se

We consider a hyperbolic system with uncertainty in the boundary and initial data. Our aim is to show that different boundary conditions give different convergence rates of the variance of the solution. This means that we can with the same knowledge of data get a more or less accurate description of the uncertainty in the solution. A variety of boundary conditions are compared and both analytical and numerical estimates of the variance of the solution are presented. As an application, we study the effect of this technique on Maxwell's equations as well as on a subsonic outflow boundary for themore » Euler equations.« less
Social Media and Language Processing: How Facebook and Twitter Provide the Best Frequency Estimates for Studying Word Recognition.

PubMed

Herdağdelen, Amaç; Marelli, Marco

2017-05-01

Corpus-based word frequencies are one of the most important predictors in language processing tasks. Frequencies based on conversational corpora (such as movie subtitles) are shown to better capture the variance in lexical decision tasks compared to traditional corpora. In this study, we show that frequencies computed from social media are currently the best frequency-based estimators of lexical decision reaction times (up to 3.6% increase in explained variance). The results are robust (observed for Twitter- and Facebook-based frequencies on American English and British English datasets) and are still substantial when we control for corpus size. © 2016 The Authors. Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.
A Robust Statistics Approach to Minimum Variance Portfolio Optimization

NASA Astrophysics Data System (ADS)

Yang, Liusha; Couillet, Romain; McKay, Matthew R.

2015-12-01

We study the design of portfolios under a minimum risk criterion. The performance of the optimized portfolio relies on the accuracy of the estimated covariance matrix of the portfolio asset returns. For large portfolios, the number of available market returns is often of similar order to the number of assets, so that the sample covariance matrix performs poorly as a covariance estimator. Additionally, financial market data often contain outliers which, if not correctly handled, may further corrupt the covariance estimation. We address these shortcomings by studying the performance of a hybrid covariance matrix estimator based on Tyler's robust M-estimator and on Ledoit-Wolf's shrinkage estimator while assuming samples with heavy-tailed distribution. Employing recent results from random matrix theory, we develop a consistent estimator of (a scaled version of) the realized portfolio risk, which is minimized by optimizing online the shrinkage intensity. Our portfolio optimization method is shown via simulations to outperform existing methods both for synthetic and real market data.
Statistical guides to estimating the number of undiscovered mineral deposits: an example with porphyry copper deposits

USGS Publications Warehouse

Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.

2005-01-01

Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes.
Robust analysis of semiparametric renewal process models

PubMed Central

Lin, Feng-Chang; Truong, Young K.; Fine, Jason P.

2013-01-01

Summary A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. PMID:24550568
Vehicle detection and orientation estimation using the radon transform

NASA Astrophysics Data System (ADS)

Pelapur, Rengarajan; Bunyak, Filiz; Palaniappan, Kannappan; Seetharaman, Gunasekaran

2013-05-01

Determining the location and orientation of vehicles in satellite and airborne imagery is a challenging task given the density of cars and other vehicles and complexity of the environment in urban scenes almost anywhere in the world. We have developed a robust and accurate method for detecting vehicles using a template-based directional chamfer matching, combined with vehicle orientation estimation based on a refined segmentation, followed by a Radon transform based profile variance peak analysis approach. The same algorithm was applied to both high resolution satellite imagery and wide area aerial imagery and initial results show robustness to illumination changes and geometric appearance distortions. Nearly 80% of the orientation angle estimates for 1585 vehicles across both satellite and aerial imagery were accurate to within 15? of the ground truth. In the case of satellite imagery alone, nearly 90% of the objects have an estimated error within +/-1.0° of the ground truth.
Robust, Adaptive Radar Detection and Estimation

DTIC Science & Technology

2015-07-21

cost function is not a convex function in R, we apply a transformation variables i.e., let X = σ2R−1 and S′ = 1 σ2 S. Then, the revised cost function in...1 viv H i . We apply this inverse covariance matrix in computing the SINR as well as estimator variance. • Rank Constrained Maximum Likelihood: Our...even as almost all available training samples are corrupted. Probability of Detection vs. SNR We apply three test statistics, the normalized matched
Robust Estimation Based on Walsh Averages for the General Linear Model.

DTIC Science & Technology

1983-11-01

estimate of I minimizing Ip(Z ) has an influence function proportional to p(y) and its asymptotic 2 2-1 variance-covariance matrix is E(* )/(E...in particular, on the influence function h(y) and quantities appearing in the asymptotic vari- ance. Some cno-ents are made on the one- and two...for signed rank estimates. The function P2 (t) of (1.4) has derivative 2(t) = - if t < -c 0 if It < c + I if t > c. *Then the influence function is h(t
Adjusting for overdispersion in piecewise exponential regression models to estimate excess mortality rate in population-based research.

PubMed

Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard

2016-10-01

In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.

Smoking and Cancers: Case-Robust Analysis of a Classic Data Set

ERIC Educational Resources Information Center

Bentler, Peter M.; Satorra, Albert; Yuan, Ke-Hai

2009-01-01

A typical structural equation model is intended to reproduce the means, variances, and correlations or covariances among a set of variables based on parameter estimates of a highly restricted model. It is not widely appreciated that the sample statistics being modeled can be quite sensitive to outliers and influential observations, leading to bias…
Synthesizing Results from Replication Studies Using Robust Variance Estimation: Corrections When the Number of Studies Is Small

ERIC Educational Resources Information Center

Tipton, Elizabeth

2014-01-01

Replication studies allow for making comparisons and generalizations regarding the effectiveness of an intervention across different populations, versions of a treatment, settings and contexts, and outcomes. One method for making these comparisons across many replication studies is through the use of meta-analysis. A recent innovation in…
Shutterless solution for simultaneous focal plane array temperature estimation and nonuniformity correction in uncooled long-wave infrared camera.

PubMed

Cao, Yanpeng; Tisse, Christel-Loic

2013-09-01

In uncooled long-wave infrared (LWIR) microbolometer imaging systems, temperature fluctuations of the focal plane array (FPA) result in thermal drift and spatial nonuniformity. In this paper, we present a novel approach based on single-image processing to simultaneously estimate temperature variances of FPAs and compensate the resulting temperature-dependent nonuniformity. Through well-controlled thermal calibrations, empirical behavioral models are derived to characterize the relationship between the responses of microbolometer and FPA temperature variations. Then, under the assumption that strong dependency exists between spatially adjacent pixels, we estimate the optimal FPA temperature so as to minimize the global intensity variance across the entire thermal infrared image. We make use of the estimated FPA temperature to infer an appropriate nonuniformity correction (NUC) profile. The performance and robustness of the proposed temperature-adaptive NUC method are evaluated on realistic IR images obtained by a 640 × 512 pixels uncooled LWIR microbolometer imaging system operating in a significantly changed temperature environment.
Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data.

PubMed

Li, Johnson Ching-Hong

2016-12-01

In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r * ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.
Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models.

PubMed

Garcia, Tanya P; Ma, Yanyuan

2017-10-01

We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root- n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.
Robust Means Modeling: An Alternative for Hypothesis Testing of Independent Means under Variance Heterogeneity and Nonnormality

ERIC Educational Resources Information Center

Fan, Weihua; Hancock, Gregory R.

2012-01-01

This study proposes robust means modeling (RMM) approaches for hypothesis testing of mean differences for between-subjects designs in order to control the biasing effects of nonnormality and variance inequality. Drawing from structural equation modeling (SEM), the RMM approaches make no assumption of variance homogeneity and employ robust…
Adaptive estimation of the log fluctuating conductivity from tracer data at the Cape Cod Site

USGS Publications Warehouse

Deng, F.W.; Cushman, J.H.; Delleur, J.W.

1993-01-01

An adaptive estimation scheme is used to obtain the integral scale and variance of the log-fluctuating conductivity at the Cape Cod site based on the fast Fourier transform/stochastic model of Deng et al. (1993) and a Kalmanlike filter. The filter incorporates prior estimates of the unknown parameters with tracer moment data to adaptively obtain improved estimates as the tracer evolves. The results show that significant improvement in the prior estimates of the conductivity can lead to substantial improvement in the ability to predict plume movement. The structure of the covariance function of the log-fluctuating conductivity can be identified from the robustness of the estimation. Both the longitudinal and transverse spatial moment data are important to the estimation.
Small-Sample Adjustments for Tests of Moderators and Model Fit in Robust Variance Estimation in Meta-Regression

ERIC Educational Resources Information Center

Tipton, Elizabeth; Pustejovsky, James E.

2015-01-01

Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Prediction-error variance in Bayesian model updating: a comparative study

NASA Astrophysics Data System (ADS)

Asadollahi, Parisa; Li, Jian; Huang, Yong

2017-04-01

In Bayesian model updating, the likelihood function is commonly formulated by stochastic embedding in which the maximum information entropy probability model of prediction error variances plays an important role and it is Gaussian distribution subject to the first two moments as constraints. The selection of prediction error variances can be formulated as a model class selection problem, which automatically involves a trade-off between the average data-fit of the model class and the information it extracts from the data. Therefore, it is critical for the robustness in the updating of the structural model especially in the presence of modeling errors. To date, three ways of considering prediction error variances have been seem in the literature: 1) setting constant values empirically, 2) estimating them based on the goodness-of-fit of the measured data, and 3) updating them as uncertain parameters by applying Bayes' Theorem at the model class level. In this paper, the effect of different strategies to deal with the prediction error variances on the model updating performance is investigated explicitly. A six-story shear building model with six uncertain stiffness parameters is employed as an illustrative example. Transitional Markov Chain Monte Carlo is used to draw samples of the posterior probability density function of the structure model parameters as well as the uncertain prediction variances. The different levels of modeling uncertainty and complexity are modeled through three FE models, including a true model, a model with more complexity, and a model with modeling error. Bayesian updating is performed for the three FE models considering the three aforementioned treatments of the prediction error variances. The effect of number of measurements on the model updating performance is also examined in the study. The results are compared based on model class assessment and indicate that updating the prediction error variances as uncertain parameters at the model class level produces more robust results especially when the number of measurement is small.
Novel health monitoring method using an RGB camera.

PubMed

Hassan, M A; Malik, A S; Fofi, D; Saad, N; Meriaudeau, F

2017-11-01

In this paper we present a novel health monitoring method by estimating the heart rate and respiratory rate using an RGB camera. The heart rate and the respiratory rate are estimated from the photoplethysmography (PPG) and the respiratory motion. The method mainly operates by using the green spectrum of the RGB camera to generate a multivariate PPG signal to perform multivariate de-noising on the video signal to extract the resultant PPG signal. A periodicity based voting scheme (PVS) was used to measure the heart rate and respiratory rate from the estimated PPG signal. We evaluated our proposed method with a state of the art heart rate measuring method for two scenarios using the MAHNOB-HCI database and a self collected naturalistic environment database. The methods were furthermore evaluated for various scenarios at naturalistic environments such as a motion variance session and a skin tone variance session. Our proposed method operated robustly during the experiments and outperformed the state of the art heart rate measuring methods by compensating the effects of the naturalistic environment.
Data assimilation method based on the constraints of confidence region

NASA Astrophysics Data System (ADS)

Li, Yong; Li, Siming; Sheng, Yao; Wang, Luheng

2018-03-01

The ensemble Kalman filter (EnKF) is a distinguished data assimilation method that is widely used and studied in various fields including methodology and oceanography. However, due to the limited sample size or imprecise dynamics model, it is usually easy for the forecast error variance to be underestimated, which further leads to the phenomenon of filter divergence. Additionally, the assimilation results of the initial stage are poor if the initial condition settings differ greatly from the true initial state. To address these problems, the variance inflation procedure is usually adopted. In this paper, we propose a new method based on the constraints of a confidence region constructed by the observations, called EnCR, to estimate the inflation parameter of the forecast error variance of the EnKF method. In the new method, the state estimate is more robust to both the inaccurate forecast models and initial condition settings. The new method is compared with other adaptive data assimilation methods in the Lorenz-63 and Lorenz-96 models under various model parameter settings. The simulation results show that the new method performs better than the competing methods.
A new variance stabilizing transformation for gene expression data analysis.

PubMed

Kelmansky, Diana M; Martínez, Elena J; Leiva, Víctor

2013-12-01

In this paper, we introduce a new family of power transformations, which has the generalized logarithm as one of its members, in the same manner as the usual logarithm belongs to the family of Box-Cox power transformations. Although the new family has been developed for analyzing gene expression data, it allows a wider scope of mean-variance related data to be reached. We study the analytical properties of the new family of transformations, as well as the mean-variance relationships that are stabilized by using its members. We propose a methodology based on this new family, which includes a simple strategy for selecting the family member adequate for a data set. We evaluate the finite sample behavior of different classical and robust estimators based on this strategy by Monte Carlo simulations. We analyze real genomic data by using the proposed transformation to empirically show how the new methodology allows the variance of these data to be stabilized.
Stratospheric Assimilation of Chemical Tracer Observations Using a Kalman Filter. Pt. 2; Chi-Square Validated Results and Analysis of Variance and Correlation Dynamics

NASA Technical Reports Server (NTRS)

Menard, Richard; Chang, Lang-Ping

1998-01-01

A Kalman filter system designed for the assimilation of limb-sounding observations of stratospheric chemical tracers, which has four tunable covariance parameters, was developed in Part I (Menard et al. 1998) The assimilation results of CH4 observations from the Cryogenic Limb Array Etalon Sounder instrument (CLAES) and the Halogen Observation Experiment instrument (HALOE) on board of the Upper Atmosphere Research Satellite are described in this paper. A robust (chi)(sup 2) criterion, which provides a statistical validation of the forecast and observational error covariances, was used to estimate the tunable variance parameters of the system. In particular, an estimate of the model error variance was obtained. The effect of model error on the forecast error variance became critical after only three days of assimilation of CLAES observations, although it took 14 days of forecast to double the initial error variance. We further found that the model error due to numerical discretization as arising in the standard Kalman filter algorithm, is comparable in size to the physical model error due to wind and transport modeling errors together. Separate assimilations of CLAES and HALOE observations were compared to validate the state estimate away from the observed locations. A wave-breaking event that took place several thousands of kilometers away from the HALOE observation locations was well captured by the Kalman filter due to highly anisotropic forecast error correlations. The forecast error correlation in the assimilation of the CLAES observations was found to have a structure similar to that in pure forecast mode except for smaller length scales. Finally, we have conducted an analysis of the variance and correlation dynamics to determine their relative importance in chemical tracer assimilation problems. Results show that the optimality of a tracer assimilation system depends, for the most part, on having flow-dependent error correlation rather than on evolving the error variance.
Robust sensor fusion of unobtrusively measured heart rate.

PubMed

Wartzek, Tobias; Brüser, Christoph; Walter, Marian; Leonhardt, Steffen

2014-03-01

Contactless vital sign measurement technologies often have the drawback of severe motion artifacts and periods in which no signal is available. However, using several identical or physically different sensors, redundancy can be used to decrease the error in noncontact heart rate estimation, while increasing the time period during which reliable data are available. In this paper, we show for the first time two major results in case of contactless heart rate measurements deduced from a capacitive ECG and optical pulse signals. First, an artifact detection is an essential preprocessing step to allow a reliable fusion. Second, the robust but computationally efficient median already provides good results; however, using a Bayesian approach, and a short time estimation of the variance, best results in terms of difference to reference heart rate and temporal coverage can be achieved. In this paper, six sensor signals were used and coverage increased from 0-90% to 80-94%, while the difference between the estimated heart rate and the gold standard was less than ±2 BPM.
A full-spectral Bayesian reconstruction approach based on the material decomposition model applied in dual-energy computed tomography.

PubMed

Cai, C; Rodet, T; Legoupil, S; Mohammad-Djafari, A

2013-11-01

Dual-energy computed tomography (DECT) makes it possible to get two fractions of basis materials without segmentation. One is the soft-tissue equivalent water fraction and the other is the hard-matter equivalent bone fraction. Practical DECT measurements are usually obtained with polychromatic x-ray beams. Existing reconstruction approaches based on linear forward models without counting the beam polychromaticity fail to estimate the correct decomposition fractions and result in beam-hardening artifacts (BHA). The existing BHA correction approaches either need to refer to calibration measurements or suffer from the noise amplification caused by the negative-log preprocessing and the ill-conditioned water and bone separation problem. To overcome these problems, statistical DECT reconstruction approaches based on nonlinear forward models counting the beam polychromaticity show great potential for giving accurate fraction images. This work proposes a full-spectral Bayesian reconstruction approach which allows the reconstruction of high quality fraction images from ordinary polychromatic measurements. This approach is based on a Gaussian noise model with unknown variance assigned directly to the projections without taking negative-log. Referring to Bayesian inferences, the decomposition fractions and observation variance are estimated by using the joint maximum a posteriori (MAP) estimation method. Subject to an adaptive prior model assigned to the variance, the joint estimation problem is then simplified into a single estimation problem. It transforms the joint MAP estimation problem into a minimization problem with a nonquadratic cost function. To solve it, the use of a monotone conjugate gradient algorithm with suboptimal descent steps is proposed. The performance of the proposed approach is analyzed with both simulated and experimental data. The results show that the proposed Bayesian approach is robust to noise and materials. It is also necessary to have the accurate spectrum information about the source-detector system. When dealing with experimental data, the spectrum can be predicted by a Monte Carlo simulator. For the materials between water and bone, less than 5% separation errors are observed on the estimated decomposition fractions. The proposed approach is a statistical reconstruction approach based on a nonlinear forward model counting the full beam polychromaticity and applied directly to the projections without taking negative-log. Compared to the approaches based on linear forward models and the BHA correction approaches, it has advantages in noise robustness and reconstruction accuracy.
Effects of sample size on estimates of population growth rates calculated with matrix models.

PubMed

Fiske, Ian J; Bruna, Emilio M; Bolker, Benjamin M

2008-08-28

Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.
Critical bounds on noise and SNR for robust estimation of real-time brain activity from functional near infra-red spectroscopy.

PubMed

Aqil, Muhammad; Jeong, Myung Yung

2018-04-24

The robust characterization of real-time brain activity carries potential for many applications. However, the contamination of measured signals by various instrumental, environmental, and physiological sources of noise introduces a substantial amount of signal variance and, consequently, challenges real-time estimation of contributions from underlying neuronal sources. Functional near infra-red spectroscopy (fNIRS) is an emerging imaging modality whose real-time potential is yet to be fully explored. The objectives of the current study are to (i) validate a time-dependent linear model of hemodynamic responses in fNIRS, and (ii) test the robustness of this approach against measurement noise (instrumental and physiological) and mis-specification of the hemodynamic response basis functions (amplitude, latency, and duration). We propose a linear hemodynamic model with time-varying parameters, which are estimated (adapted and tracked) using a dynamic recursive least square algorithm. Owing to the linear nature of the activation model, the problem of achieving robust convergence to an accurate estimation of the model parameters is recast as a problem of parameter error stability around the origin. We show that robust convergence of the proposed method is guaranteed in the presence of an acceptable degree of model misspecification and we derive an upper bound on noise under which reliable parameters can still be inferred. We also derived a lower bound on signal-to-noise-ratio over which the reliable parameters can still be inferred from a channel/voxel. Whilst here applied to fNIRS, the proposed methodology is applicable to other hemodynamic-based imaging technologies such as functional magnetic resonance imaging. Copyright © 2018 Elsevier Inc. All rights reserved.
The Pattern Across the Continental United States of Evapotranspiration Variability Associated with Water Availability

NASA Technical Reports Server (NTRS)

Koster, Randal D.; Salvucci, Guido D.; Rigden, Angela J.; Jung, Martin; Collatz, G. James; Schubert, Siegfried D.

2015-01-01

The spatial pattern across the continental United States of the interannual variance of warm season water-dependent evapotranspiration, a pattern of relevance to land-atmosphere feedback, cannot be measured directly. Alternative and indirect approaches to estimating the pattern, however, do exist, and given the uncertainty of each, we use several such approaches here. We first quantify the water dependent evapotranspiration variance pattern inherent in two derived evapotranspiration datasets available from the literature. We then search for the pattern in proxy geophysical variables (air temperature, stream flow, and NDVI) known to have strong ties to evapotranspiration. The variances inherent in all of the different (and mostly independent) data sources show some differences but are generally strongly consistent they all show a large variance signal down the center of the U.S., with lower variances toward the east and (for the most part) toward the west. The robustness of the pattern across the datasets suggests that it indeed represents the pattern operating in nature. Using Budykos hydroclimatic framework, we show that the pattern can largely be explained by the relative strength of water and energy controls on evapotranspiration across the continent.
Replication of a gene-environment interaction Via Multimodel inference: additive-genetic variance in adolescents' general cognitive ability increases with family-of-origin socioeconomic status.

PubMed

Kirkpatrick, Robert M; McGue, Matt; Iacono, William G

2015-03-01

The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES-an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research.
Replication of a Gene-Environment Interaction via Multimodel Inference: Additive-Genetic Variance in Adolescents’ General Cognitive Ability Increases with Family-of-Origin Socioeconomic Status

PubMed Central

Kirkpatrick, Robert M.; McGue, Matt; Iacono, William G.

2015-01-01

The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES—an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research. PMID:25539975

Evaluating alternate models to estimate genetic parameters of calving traits in United Kingdom Holstein-Friesian dairy cattle.

PubMed

Eaglen, Sophie A E; Coffey, Mike P; Woolliams, John A; Wall, Eileen

2012-07-28

The focus in dairy cattle breeding is gradually shifting from production to functional traits and genetic parameters of calving traits are estimated more frequently. However, across countries, various statistical models are used to estimate these parameters. This study evaluates different models for calving ease and stillbirth in United Kingdom Holstein-Friesian cattle. Data from first and later parity records were used. Genetic parameters for calving ease, stillbirth and gestation length were estimated using the restricted maximum likelihood method, considering different models i.e. sire (-maternal grandsire), animal, univariate and bivariate models. Gestation length was fitted as a correlated indicator trait and, for all three traits, genetic correlations between first and later parities were estimated. Potential bias in estimates was avoided by acknowledging a possible environmental direct-maternal covariance. The total heritable variance was estimated for each trait to discuss its theoretical importance and practical value. Prediction error variances and accuracies were calculated to compare the models. On average, direct and maternal heritabilities for calving traits were low, except for direct gestation length. Calving ease in first parity had a significant and negative direct-maternal genetic correlation. Gestation length was maternally correlated to stillbirth in first parity and directly correlated to calving ease in later parities. Multi-trait models had a slightly greater predictive ability than univariate models, especially for the lowly heritable traits. The computation time needed for sire (-maternal grandsire) models was much smaller than for animal models with only small differences in accuracy. The sire (-maternal grandsire) model was robust when additional genetic components were estimated, while the equivalent animal model had difficulties reaching convergence. For the evaluation of calving traits, multi-trait models show a slight advantage over univariate models. Extended sire models (-maternal grandsire) are more practical and robust than animal models. Estimated genetic parameters for calving traits of UK Holstein cattle are consistent with literature. Calculating an aggregate estimated breeding value including direct and maternal values should encourage breeders to consider both direct and maternal effects in selection decisions.
Evaluating alternate models to estimate genetic parameters of calving traits in United Kingdom Holstein-Friesian dairy cattle

PubMed Central

2012-01-01

Background The focus in dairy cattle breeding is gradually shifting from production to functional traits and genetic parameters of calving traits are estimated more frequently. However, across countries, various statistical models are used to estimate these parameters. This study evaluates different models for calving ease and stillbirth in United Kingdom Holstein-Friesian cattle. Methods Data from first and later parity records were used. Genetic parameters for calving ease, stillbirth and gestation length were estimated using the restricted maximum likelihood method, considering different models i.e. sire (−maternal grandsire), animal, univariate and bivariate models. Gestation length was fitted as a correlated indicator trait and, for all three traits, genetic correlations between first and later parities were estimated. Potential bias in estimates was avoided by acknowledging a possible environmental direct-maternal covariance. The total heritable variance was estimated for each trait to discuss its theoretical importance and practical value. Prediction error variances and accuracies were calculated to compare the models. Results and discussion On average, direct and maternal heritabilities for calving traits were low, except for direct gestation length. Calving ease in first parity had a significant and negative direct-maternal genetic correlation. Gestation length was maternally correlated to stillbirth in first parity and directly correlated to calving ease in later parities. Multi-trait models had a slightly greater predictive ability than univariate models, especially for the lowly heritable traits. The computation time needed for sire (−maternal grandsire) models was much smaller than for animal models with only small differences in accuracy. The sire (−maternal grandsire) model was robust when additional genetic components were estimated, while the equivalent animal model had difficulties reaching convergence. Conclusions For the evaluation of calving traits, multi-trait models show a slight advantage over univariate models. Extended sire models (−maternal grandsire) are more practical and robust than animal models. Estimated genetic parameters for calving traits of UK Holstein cattle are consistent with literature. Calculating an aggregate estimated breeding value including direct and maternal values should encourage breeders to consider both direct and maternal effects in selection decisions. PMID:22839757
Sample size calculation in cost-effectiveness cluster randomized trials: optimal and maximin approaches.

PubMed

Manju, Md Abu; Candel, Math J J M; Berger, Martijn P F

2014-07-10

In this paper, the optimal sample sizes at the cluster and person levels for each of two treatment arms are obtained for cluster randomized trials where the cost-effectiveness of treatments on a continuous scale is studied. The optimal sample sizes maximize the efficiency or power for a given budget or minimize the budget for a given efficiency or power. Optimal sample sizes require information on the intra-cluster correlations (ICCs) for effects and costs, the correlations between costs and effects at individual and cluster levels, the ratio of the variance of effects translated into costs to the variance of the costs (the variance ratio), sampling and measuring costs, and the budget. When planning, a study information on the model parameters usually is not available. To overcome this local optimality problem, the current paper also presents maximin sample sizes. The maximin sample sizes turn out to be rather robust against misspecifying the correlation between costs and effects at the cluster and individual levels but may lose much efficiency when misspecifying the variance ratio. The robustness of the maximin sample sizes against misspecifying the ICCs depends on the variance ratio. The maximin sample sizes are robust under misspecification of the ICC for costs for realistic values of the variance ratio greater than one but not robust under misspecification of the ICC for effects. Finally, we show how to calculate optimal or maximin sample sizes that yield sufficient power for a test on the cost-effectiveness of an intervention.
Genetic variance in micro-environmental sensitivity for milk and milk quality in Walloon Holstein cattle.

PubMed

Vandenplas, J; Bastin, C; Gengler, N; Mulder, H A

2013-09-01

Animals that are robust to environmental changes are desirable in the current dairy industry. Genetic differences in micro-environmental sensitivity can be studied through heterogeneity of residual variance between animals. However, residual variance between animals is usually assumed to be homogeneous in traditional genetic evaluations. The aim of this study was to investigate genetic heterogeneity of residual variance by estimating variance components in residual variance for milk yield, somatic cell score, contents in milk (g/dL) of 2 groups of milk fatty acids (i.e., saturated and unsaturated fatty acids), and the content in milk of one individual fatty acid (i.e., oleic acid, C18:1 cis-9), for first-parity Holstein cows in the Walloon Region of Belgium. A total of 146,027 test-day records from 26,887 cows in 747 herds were available. All cows had at least 3 records and a known sire. These sires had at least 10 cows with records and each herd × test-day had at least 5 cows. The 5 traits were analyzed separately based on fixed lactation curve and random regression test-day models for the mean. Estimation of variance components was performed by running iteratively expectation maximization-REML algorithm by the implementation of double hierarchical generalized linear models. Based on fixed lactation curve test-day mean models, heritability for residual variances ranged between 1.01×10(-3) and 4.17×10(-3) for all traits. The genetic standard deviation in residual variance (i.e., approximately the genetic coefficient of variation of residual variance) ranged between 0.12 and 0.17. Therefore, some genetic variance in micro-environmental sensitivity existed in the Walloon Holstein dairy cattle for the 5 studied traits. The standard deviations due to herd × test-day and permanent environment in residual variance ranged between 0.36 and 0.45 for herd × test-day effect and between 0.55 and 0.97 for permanent environmental effect. Therefore, nongenetic effects also contributed substantially to micro-environmental sensitivity. Addition of random regressions to the mean model did not reduce heterogeneity in residual variance and that genetic heterogeneity of residual variance was not simply an effect of an incomplete mean model. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Fourier Spot Volatility Estimator: Asymptotic Normality and Efficiency with Liquid and Illiquid High-Frequency Data

PubMed Central

2015-01-01

The recent availability of high frequency data has permitted more efficient ways of computing volatility. However, estimation of volatility from asset price observations is challenging because observed high frequency data are generally affected by noise-microstructure effects. We address this issue by using the Fourier estimator of instantaneous volatility introduced in Malliavin and Mancino 2002. We prove a central limit theorem for this estimator with optimal rate and asymptotic variance. An extensive simulation study shows the accuracy of the spot volatility estimates obtained using the Fourier estimator and its robustness even in the presence of different microstructure noise specifications. An empirical analysis on high frequency data (U.S. S&P500 and FIB 30 indices) illustrates how the Fourier spot volatility estimates can be successfully used to study intraday variations of volatility and to predict intraday Value at Risk. PMID:26421617
Statistics based sampling for controller and estimator design

NASA Astrophysics Data System (ADS)

Tenne, Dirk

The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
A Two-Stage Kalman Filter Approach for Robust and Real-Time Power System State Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jinghe; Welch, Greg; Bishop, Gary

2014-04-01

As electricity demand continues to grow and renewable energy increases its penetration in the power grid, realtime state estimation becomes essential for system monitoring and control. Recent development in phasor technology makes it possible with high-speed time-synchronized data provided by Phasor Measurement Units (PMU). In this paper we present a two-stage Kalman filter approach to estimate the static state of voltage magnitudes and phase angles, as well as the dynamic state of generator rotor angles and speeds. Kalman filters achieve optimal performance only when the system noise characteristics have known statistical properties (zero-mean, Gaussian, and spectrally white). However in practicemore » the process and measurement noise models are usually difficult to obtain. Thus we have developed the Adaptive Kalman Filter with Inflatable Noise Variances (AKF with InNoVa), an algorithm that can efficiently identify and reduce the impact of incorrect system modeling and/or erroneous measurements. In stage one, we estimate the static state from raw PMU measurements using the AKF with InNoVa; then in stage two, the estimated static state is fed into an extended Kalman filter to estimate the dynamic state. Simulations demonstrate its robustness to sudden changes of system dynamics and erroneous measurements.« less
Spatially constrained incoherent motion method improves diffusion-weighted MRI signal decay analysis in the liver and spleen

PubMed Central

Taimouri, Vahid; Afacan, Onur; Perez-Rossello, Jeannette M.; Callahan, Michael J.; Mulkern, Robert V.; Warfield, Simon K.; Freiman, Moti

2015-01-01

Purpose: To evaluate the effect of the spatially constrained incoherent motion (SCIM) method on improving the precision and robustness of fast and slow diffusion parameter estimates from diffusion-weighted MRI in liver and spleen in comparison to the independent voxel-wise intravoxel incoherent motion (IVIM) model. Methods: We collected diffusion-weighted MRI (DW-MRI) data of 29 subjects (5 healthy subjects and 24 patients with Crohn’s disease in the ileum). We evaluated parameters estimates’ robustness against different combinations of b-values (i.e., 4 b-values and 7 b-values) by comparing the variance of the estimates obtained with the SCIM and the independent voxel-wise IVIM model. We also evaluated the improvement in the precision of parameter estimates by comparing the coefficient of variation (CV) of the SCIM parameter estimates to that of the IVIM. Results: The SCIM method was more robust compared to IVIM (up to 70% in liver and spleen) for different combinations of b-values. Also, the CV values of the parameter estimations using the SCIM method were significantly lower compared to repeated acquisition and signal averaging estimated using IVIM, especially for the fast diffusion parameter in liver (CVIV IM = 46.61 ± 11.22, CVSCIM = 16.85 ± 2.160, p < 0.001) and spleen (CVIV IM = 95.15 ± 19.82, CVSCIM = 52.55 ± 1.91, p < 0.001). Conclusions: The SCIM method characterizes fast and slow diffusion more precisely compared to the independent voxel-wise IVIM model fitting in the liver and spleen. PMID:25832079
Inference on periodicity of circadian time series.

PubMed

Costa, Maria J; Finkenstädt, Bärbel; Roche, Véronique; Lévi, Francis; Gould, Peter D; Foreman, Julia; Halliday, Karen; Hall, Anthony; Rand, David A

2013-09-01

Estimation of the period length of time-course data from cyclical biological processes, such as those driven by the circadian pacemaker, is crucial for inferring the properties of the biological clock found in many living organisms. We propose a methodology for period estimation based on spectrum resampling (SR) techniques. Simulation studies show that SR is superior and more robust to non-sinusoidal and noisy cycles than a currently used routine based on Fourier approximations. In addition, a simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice. The proposed methods are motivated by and applied to various data examples from chronobiology.
On long-only information-based portfolio diversification framework

NASA Astrophysics Data System (ADS)

Santos, Raphael A.; Takada, Hellinton H.

2014-12-01

Using the concepts from information theory, it is possible to improve the traditional frameworks for long-only asset allocation. In modern portfolio theory, the investor has two basic procedures: the choice of a portfolio that maximizes its risk-adjusted excess return or the mixed allocation between the maximum Sharpe portfolio and the risk-free asset. In the literature, the first procedure was already addressed using information theory. One contribution of this paper is the consideration of the second procedure in the information theory context. The performance of these approaches was compared with three traditional asset allocation methodologies: the Markowitz's mean-variance, the resampled mean-variance and the equally weighted portfolio. Using simulated and real data, the information theory-based methodologies were verified to be more robust when dealing with the estimation errors.
Epidemiological characteristics of reported sporadic and outbreak cases of E. coli O157 in people from Alberta, Canada (2000-2002): methodological challenges of comparing clustered to unclustered data.

PubMed

Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A

2008-04-01

Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.
New spatial upscaling methods for multi-point measurements: From normal to p-normal

NASA Astrophysics Data System (ADS)

Liu, Feng; Li, Xin

2017-12-01

Careful attention must be given to determining whether the geophysical variables of interest are normally distributed, since the assumption of a normal distribution may not accurately reflect the probability distribution of some variables. As a generalization of the normal distribution, the p-normal distribution and its corresponding maximum likelihood estimation (the least power estimation, LPE) were introduced in upscaling methods for multi-point measurements. Six methods, including three normal-based methods, i.e., arithmetic average, least square estimation, block kriging, and three p-normal-based methods, i.e., LPE, geostatistics LPE and inverse distance weighted LPE are compared in two types of experiments: a synthetic experiment to evaluate the performance of the upscaling methods in terms of accuracy, stability and robustness, and a real-world experiment to produce real-world upscaling estimates using soil moisture data obtained from multi-scale observations. The results show that the p-normal-based methods produced lower mean absolute errors and outperformed the other techniques due to their universality and robustness. We conclude that introducing appropriate statistical parameters into an upscaling strategy can substantially improve the estimation, especially if the raw measurements are disorganized; however, further investigation is required to determine which parameter is the most effective among variance, spatial correlation information and parameter p.
Efficient computation of parameter sensitivities of discrete stochastic chemical reaction networks.

PubMed

Rathinam, Muruhan; Sheppard, Patrick W; Khammash, Mustafa

2010-01-21

Parametric sensitivity of biochemical networks is an indispensable tool for studying system robustness properties, estimating network parameters, and identifying targets for drug therapy. For discrete stochastic representations of biochemical networks where Monte Carlo methods are commonly used, sensitivity analysis can be particularly challenging, as accurate finite difference computations of sensitivity require a large number of simulations for both nominal and perturbed values of the parameters. In this paper we introduce the common random number (CRN) method in conjunction with Gillespie's stochastic simulation algorithm, which exploits positive correlations obtained by using CRNs for nominal and perturbed parameters. We also propose a new method called the common reaction path (CRP) method, which uses CRNs together with the random time change representation of discrete state Markov processes due to Kurtz to estimate the sensitivity via a finite difference approximation applied to coupled reaction paths that emerge naturally in this representation. While both methods reduce the variance of the estimator significantly compared to independent random number finite difference implementations, numerical evidence suggests that the CRP method achieves a greater variance reduction. We also provide some theoretical basis for the superior performance of CRP. The improved accuracy of these methods allows for much more efficient sensitivity estimation. In two example systems reported in this work, speedup factors greater than 300 and 10,000 are demonstrated.
Glomerular structural-functional relationship models of diabetic nephropathy are robust in type 1 diabetic patients.

PubMed

Mauer, Michael; Caramori, Maria Luiza; Fioretto, Paola; Najafian, Behzad

2015-06-01

Studies of structural-functional relationships have improved understanding of the natural history of diabetic nephropathy (DN). However, in order to consider structural end points for clinical trials, the robustness of the resultant models needs to be verified. This study examined whether structural-functional relationship models derived from a large cohort of type 1 diabetic (T1D) patients with a wide range of renal function are robust. The predictability of models derived from multiple regression analysis and piecewise linear regression analysis was also compared. T1D patients (n = 161) with research renal biopsies were divided into two equal groups matched for albumin excretion rate (AER). Models to explain AER and glomerular filtration rate (GFR) by classical DN lesions in one group (T1D-model, or T1D-M) were applied to the other group (T1D-test, or T1D-T) and regression analyses were performed. T1D-M-derived models explained 70 and 63% of AER variance and 32 and 21% of GFR variance in T1D-M and T1D-T, respectively, supporting the substantial robustness of the models. Piecewise linear regression analyses substantially improved predictability of the models with 83% of AER variance and 66% of GFR variance explained by classical DN glomerular lesions alone. These studies demonstrate that DN structural-functional relationship models are robust, and if appropriate models are used, glomerular lesions alone explain a major proportion of AER and GFR variance in T1D patients. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Efficient design of cluster randomized trials with treatment-dependent costs and treatment-dependent unknown variances.

PubMed

van Breukelen, Gerard J P; Candel, Math J J M

2018-06-10

Cluster randomized trials evaluate the effect of a treatment on persons nested within clusters, where treatment is randomly assigned to clusters. Current equations for the optimal sample size at the cluster and person level assume that the outcome variances and/or the study costs are known and homogeneous between treatment arms. This paper presents efficient yet robust designs for cluster randomized trials with treatment-dependent costs and treatment-dependent unknown variances, and compares these with 2 practical designs. First, the maximin design (MMD) is derived, which maximizes the minimum efficiency (minimizes the maximum sampling variance) of the treatment effect estimator over a range of treatment-to-control variance ratios. The MMD is then compared with the optimal design for homogeneous variances and costs (balanced design), and with that for homogeneous variances and treatment-dependent costs (cost-considered design). The results show that the balanced design is the MMD if the treatment-to control cost ratio is the same at both design levels (cluster, person) and within the range for the treatment-to-control variance ratio. It still is highly efficient and better than the cost-considered design if the cost ratio is within the range for the squared variance ratio. Outside that range, the cost-considered design is better and highly efficient, but it is not the MMD. An example shows sample size calculation for the MMD, and the computer code (SPSS and R) is provided as supplementary material. The MMD is recommended for trial planning if the study costs are treatment-dependent and homogeneity of variances cannot be assumed. © 2018 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
An Overdetermined System for Improved Autocorrelation Based Spectral Moment Estimator Performance

NASA Technical Reports Server (NTRS)

Keel, Byron M.

1996-01-01

Autocorrelation based spectral moment estimators are typically derived using the Fourier transform relationship between the power spectrum and the autocorrelation function along with using either an assumed form of the autocorrelation function, e.g., Gaussian, or a generic complex form and applying properties of the characteristic function. Passarelli has used a series expansion of the general complex autocorrelation function and has expressed the coefficients in terms of central moments of the power spectrum. A truncation of this series will produce a closed system of equations which can be solved for the central moments of interest. The autocorrelation function at various lags is estimated from samples of the random process under observation. These estimates themselves are random variables and exhibit a bias and variance that is a function of the number of samples used in the estimates and the operational signal-to-noise ratio. This contributes to a degradation in performance of the moment estimators. This dissertation investigates the use autocorrelation function estimates at higher order lags to reduce the bias and standard deviation in spectral moment estimates. In particular, Passarelli's series expansion is cast in terms of an overdetermined system to form a framework under which the application of additional autocorrelation function estimates at higher order lags can be defined and assessed. The solution of the overdetermined system is the least squares solution. Furthermore, an overdetermined system can be solved for any moment or moments of interest and is not tied to a particular form of the power spectrum or corresponding autocorrelation function. As an application of this approach, autocorrelation based variance estimators are defined by a truncation of Passarelli's series expansion and applied to simulated Doppler weather radar returns which are characterized by a Gaussian shaped power spectrum. The performance of the variance estimators determined from a closed system is shown to improve through the application of additional autocorrelation lags in an overdetermined system. This improvement is greater in the narrowband spectrum region where the information is spread over more lags of the autocorrelation function. The number of lags needed in the overdetermined system is a function of the spectral width, the number of terms in the series expansion, the number of samples used in estimating the autocorrelation function, and the signal-to-noise ratio. The overdetermined system provides a robustness to the chosen variance estimator by expanding the region of spectral widths and signal-to-noise ratios over which the estimator can perform as compared to the closed system.
The comparison of robust partial least squares regression with robust principal component regression on a real

NASA Astrophysics Data System (ADS)

Polat, Esra; Gunay, Suleyman

2013-10-01

One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
A robust pseudo-inverse spectral filter applied to the Earth Radiation Budget Experiment (ERBE) scanning channels

NASA Technical Reports Server (NTRS)

Avis, L. M.; Green, R. N.; Suttles, J. T.; Gupta, S. K.

1984-01-01

Computer simulations of a least squares estimator operating on the ERBE scanning channels are discussed. The estimator is designed to minimize the errors produced by nonideal spectral response to spectrally varying and uncertain radiant input. The three ERBE scanning channels cover a shortwave band a longwave band and a ""total'' band from which the pseudo inverse spectral filter estimates the radiance components in the shortwave band and a longwave band. The radiance estimator draws on instantaneous field of view (IFOV) scene type information supplied by another algorithm of the ERBE software, and on a priori probabilistic models of the responses of the scanning channels to the IFOV scene types for given Sun scene spacecraft geometry. It is found that the pseudoinverse spectral filter is stable, tolerant of errors in scene identification and in channel response modeling, and, in the absence of such errors, yields minimum variance and essentially unbiased radiance estimates.
Seabed mapping and characterization of sediment variability using the usSEABED data base

USGS Publications Warehouse

Goff, J.A.; Jenkins, C.J.; Jeffress, Williams S.

2008-01-01

We present a methodology for statistical analysis of randomly located marine sediment point data, and apply it to the US continental shelf portions of usSEABED mean grain size records. The usSEABED database, like many modern, large environmental datasets, is heterogeneous and interdisciplinary. We statistically test the database as a source of mean grain size data, and from it provide a first examination of regional seafloor sediment variability across the entire US continental shelf. Data derived from laboratory analyses ("extracted") and from word-based descriptions ("parsed") are treated separately, and they are compared statistically and deterministically. Data records are selected for spatial analysis by their location within sample regions: polygonal areas defined in ArcGIS chosen by geography, water depth, and data sufficiency. We derive isotropic, binned semivariograms from the data, and invert these for estimates of noise variance, field variance, and decorrelation distance. The highly erratic nature of the semivariograms is a result both of the random locations of the data and of the high level of data uncertainty (noise). This decorrelates the data covariance matrix for the inversion, and largely prevents robust estimation of the fractal dimension. Our comparison of the extracted and parsed mean grain size data demonstrates important differences between the two. In particular, extracted measurements generally produce finer mean grain sizes, lower noise variance, and lower field variance than parsed values. Such relationships can be used to derive a regionally dependent conversion factor between the two. Our analysis of sample regions on the US continental shelf revealed considerable geographic variability in the estimated statistical parameters of field variance and decorrelation distance. Some regional relationships are evident, and overall there is a tendency for field variance to be higher where the average mean grain size is finer grained. Surprisingly, parsed and extracted noise magnitudes correlate with each other, which may indicate that some portion of the data variability that we identify as "noise" is caused by real grain size variability at very short scales. Our analyses demonstrate that by applying a bias-correction proxy, usSEABED data can be used to generate reliable interpolated maps of regional mean grain size and sediment character.
Robust design of a 2-DOF GMV controller: a direct self-tuning and fuzzy scheduling approach.

PubMed

Silveira, Antonio S; Rodríguez, Jaime E N; Coelho, Antonio A R

2012-01-01

This paper presents a study on self-tuning control strategies with generalized minimum variance control in a fixed two degree of freedom structure-or simply GMV2DOF-within two adaptive perspectives. One, from the process model point of view, using a recursive least squares estimator algorithm for direct self-tuning design, and another, using a Mamdani fuzzy GMV2DOF parameters scheduling technique based on analytical and physical interpretations from robustness analysis of the system. Both strategies are assessed by simulation and real plants experimentation environments composed of a damped pendulum and an under development wind tunnel from the Department of Automation and Systems of the Federal University of Santa Catarina. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

A proxy for variance in dense matching over homogeneous terrain

NASA Astrophysics Data System (ADS)

Altena, Bas; Cockx, Liesbet; Goedemé, Toon

2014-05-01

Automation in photogrammetry and avionics have brought highly autonomous UAV mapping solutions on the market. These systems have great potential for geophysical research, due to their mobility and simplicity of work. Flight planning can be done on site and orientation parameters are estimated automatically. However, one major drawback is still present: if contrast is lacking, stereoscopy fails. Consequently, topographic information cannot be obtained precisely through photogrammetry for areas with low contrast. Even though more robustness is added in the estimation through multi-view geometry, a precise product is still lacking. For the greater part, interpolation is applied over these regions, where the estimation is constrained by uniqueness, its epipolar line and smoothness. Consequently, digital surface models are generated with an estimate of the topography, without holes but also without an indication of its variance. Every dense matching algorithm is based on a similarity measure. Our methodology uses this property to support the idea that if only noise is present, no correspondence can be detected. Therefore, the noise level is estimated in respect to the intensity signal of the topography (SNR) and this ratio serves as a quality indicator for the automatically generated product. To demonstrate this variance indicator, two different case studies were elaborated. The first study is situated at an open sand mine near the village of Kiezegem, Belgium. Two different UAV systems flew over the site. One system had automatic intensity regulation, and resulted in low contrast over the sandy interior of the mine. That dataset was used to identify the weak estimations of the topography and was compared with the data from the other UAV flight. In the second study a flight campaign with the X100 system was conducted along the coast near Wenduine, Belgium. The obtained images were processed through structure-from-motion software. Although the beach had a very low variance in intensity, the topography was reconstructed entirely. This indicates that to a large extent interpolation was applied. To assess this amount of interpolation processing is done with imagery which is gradually downgraded. Through linking these products with the variance indicator (SNR) this results in a quantitative relation of the interpolation influence onto the topography estimation in respect to contrast. Our proposed method is capable of providing a clear indication of variance in reconstructions from UAV photogrammetry. This indicator has a practical advantage, as it can be implemented before the computational intensive matching phase. As such an acquired dataset can be tested in the field. If an area with too little contrast is identified, camera settings can be adjusted for a new flight, or additional measurements can be done through traditional means.
Momentum Flux Determination Using the Multi-beam Poker Flat Incoherent Scatter Radar

NASA Technical Reports Server (NTRS)

Nicolls, M. J.; Fritts, D. C.; Janches, Diego; Heinselman, C. J.

2012-01-01

In this paper, we develop an estimator for the vertical flux of horizontal momentum with arbitrary beam pointing, applicable to the case of arbitrary but fixed beam pointing with systems such as the Poker Flat Incoherent Scatter Radar (PFISR). This method uses information from all available beams to resolve the variances of the wind field in addition to the vertical flux of both meridional and zonal momentum, targeted for high-frequency wave motions. The estimator utilises the full covariance of the distributed measurements, which provides a significant reduction in errors over the direct extension of previously developed techniques and allows for the calculation of an error covariance matrix of the estimated quantities. We find that for the PFISR experiment, we can construct an unbiased and robust estimator of the momentum flux if sufficient and proper beam orientations are chosen, which can in the future be optimized for the expected frequency distribution of momentum-containing scales. However, there is a potential trade-off between biases and standard errors introduced with the new approach, which must be taken into account when assessing the momentum fluxes. We apply the estimator to PFISR measurements on 23 April 2008 and 21 December 2007, from 60-85 km altitude, and show expected results as compared to mean winds and in relation to the measured vertical velocity variances.
Reference tissue modeling with parameter coupling: application to a study of SERT binding in HIV

NASA Astrophysics Data System (ADS)

Endres, Christopher J.; Hammoud, Dima A.; Pomper, Martin G.

2011-04-01

When applicable, it is generally preferred to evaluate positron emission tomography (PET) studies using a reference tissue-based approach as that avoids the need for invasive arterial blood sampling. However, most reference tissue methods have been shown to have a bias that is dependent on the level of tracer binding, and the variability of parameter estimates may be substantially affected by noise level. In a study of serotonin transporter (SERT) binding in HIV dementia, it was determined that applying parameter coupling to the simplified reference tissue model (SRTM) reduced the variability of parameter estimates and yielded the strongest between-group significant differences in SERT binding. The use of parameter coupling makes the application of SRTM more consistent with conventional blood input models and reduces the total number of fitted parameters, thus should yield more robust parameter estimates. Here, we provide a detailed evaluation of the application of parameter constraint and parameter coupling to [11C]DASB PET studies. Five quantitative methods, including three methods that constrain the reference tissue clearance (kr2) to a common value across regions were applied to the clinical and simulated data to compare measurement of the tracer binding potential (BPND). Compared with standard SRTM, either coupling of kr2 across regions or constraining kr2 to a first-pass estimate improved the sensitivity of SRTM to measuring a significant difference in BPND between patients and controls. Parameter coupling was particularly effective in reducing the variance of parameter estimates, which was less than 50% of the variance obtained with standard SRTM. A linear approach was also improved when constraining kr2 to a first-pass estimate, although the SRTM-based methods yielded stronger significant differences when applied to the clinical study. This work shows that parameter coupling reduces the variance of parameter estimates and may better discriminate between-group differences in specific binding.
Estimation of genetic parameters and response to selection for a continuous trait subject to culling before testing.

PubMed

Arnason, T; Albertsdóttir, E; Fikse, W F; Eriksson, S; Sigurdsson, A

2012-02-01

The consequences of assuming a zero environmental covariance between a binary trait 'test-status' and a continuous trait on the estimates of genetic parameters by restricted maximum likelihood and Gibbs sampling and on response from genetic selection when the true environmental covariance deviates from zero were studied. Data were simulated for two traits (one that culling was based on and a continuous trait) using the following true parameters, on the underlying scale: h² = 0.4; r(A) = 0.5; r(E) = 0.5, 0.0 or -0.5. The selection on the continuous trait was applied to five subsequent generations where 25 sires and 500 dams produced 1500 offspring per generation. Mass selection was applied in the analysis of the effect on estimation of genetic parameters. Estimated breeding values were used in the study of the effect of genetic selection on response and accuracy. The culling frequency was either 0.5 or 0.8 within each generation. Each of 10 replicates included 7500 records on 'test-status' and 9600 animals in the pedigree file. Results from bivariate analysis showed unbiased estimates of variance components and genetic parameters when true r(E) = 0.0. For r(E) = 0.5, variance components (13-19% bias) and especially (50-80%) were underestimated for the continuous trait, while heritability estimates were unbiased. For r(E) = -0.5, heritability estimates of test-status were unbiased, while genetic variance and heritability of the continuous trait together with were overestimated (25-50%). The bias was larger for the higher culling frequency. Culling always reduced genetic progress from selection, but the genetic progress was found to be robust to the use of wrong parameter values of the true environmental correlation between test-status and the continuous trait. Use of a bivariate linear-linear model reduced bias in genetic evaluations, when data were subject to culling. © 2011 Blackwell Verlag GmbH.
Radar modulation classification using time-frequency representation and nonlinear regression

NASA Astrophysics Data System (ADS)

De Luigi, Christophe; Arques, Pierre-Yves; Lopez, Jean-Marc; Moreau, Eric

1999-09-01

In naval electronic environment, pulses emitted by radars are collected by ESM receivers. For most of them the intrapulse signal is modulated by a particular law. To help the classical identification process, a classification and estimation of this modulation law is applied on the intrapulse signal measurements. To estimate with a good accuracy the time-varying frequency of a signal corrupted by an additive noise, one method has been chosen. This method consists on the Wigner distribution calculation, the instantaneous frequency is then estimated by the peak location of the distribution. Bias and variance of the estimator are performed by computed simulations. In a estimated sequence of frequencies, we assume the presence of false and good estimated ones, the hypothesis of Gaussian distribution is made on the errors. A robust non linear regression method, based on the Levenberg-Marquardt algorithm, is thus applied on these estimated frequencies using a Maximum Likelihood Estimator. The performances of the method are tested by using varied modulation laws and different signal to noise ratios.
Variance component and breeding value estimation for genetic heterogeneity of residual variance in Swedish Holstein dairy cattle.

PubMed

Rönnegård, L; Felleki, M; Fikse, W F; Mulder, H A; Strandberg, E

2013-04-01

Trait uniformity, or micro-environmental sensitivity, may be studied through individual differences in residual variance. These differences appear to be heritable, and the need exists, therefore, to fit models to predict breeding values explaining differences in residual variance. The aim of this paper is to estimate breeding values for micro-environmental sensitivity (vEBV) in milk yield and somatic cell score, and their associated variance components, on a large dairy cattle data set having more than 1.6 million records. Estimation of variance components, ordinary breeding values, and vEBV was performed using standard variance component estimation software (ASReml), applying the methodology for double hierarchical generalized linear models. Estimation using ASReml took less than 7 d on a Linux server. The genetic standard deviations for residual variance were 0.21 and 0.22 for somatic cell score and milk yield, respectively, which indicate moderate genetic variance for residual variance and imply that a standard deviation change in vEBV for one of these traits would alter the residual variance by 20%. This study shows that estimation of variance components, estimated breeding values and vEBV, is feasible for large dairy cattle data sets using standard variance component estimation software. The possibility to select for uniformity in Holstein dairy cattle based on these estimates is discussed. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
A Comparative Analysis of the Cost Estimating Error Risk Associated with Flyaway Costs Versus Individual Components of Aircraft

DTIC Science & Technology

2003-03-01

test returns a p-value greater than 0.05. Similarly, the assumption of constant variance can be confirmed using the Breusch - Pagan test ...megaphone effect. To test this visual observation, the Breusch - Pagan test is applied. .515 6 3.919 31     2 5.371= The p-value returned from this...The data points have a relatively even spread, but a potential megaphone pattern is present. An application of the more robust Breusch - Pagan test
Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

PubMed

Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

2016-12-20

Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials.

PubMed

Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B

2017-04-01

Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Maximum likelihood estimation of correction for dilution bias in simple linear regression using replicates from subjects with extreme first measurements.

PubMed

Berglund, Lars; Garmo, Hans; Lindbäck, Johan; Svärdsudd, Kurt; Zethelius, Björn

2008-09-30

The least-squares estimator of the slope in a simple linear regression model is biased towards zero when the predictor is measured with random error. A corrected slope may be estimated by adding data from a reliability study, which comprises a subset of subjects from the main study. The precision of this corrected slope depends on the design of the reliability study and estimator choice. Previous work has assumed that the reliability study constitutes a random sample from the main study. A more efficient design is to use subjects with extreme values on their first measurement. Previously, we published a variance formula for the corrected slope, when the correction factor is the slope in the regression of the second measurement on the first. In this paper we show that both designs improve by maximum likelihood estimation (MLE). The precision gain is explained by the inclusion of data from all subjects for estimation of the predictor's variance and by the use of the second measurement for estimation of the covariance between response and predictor. The gain of MLE enhances with stronger true relationship between response and predictor and with lower precision in the predictor measurements. We present a real data example on the relationship between fasting insulin, a surrogate marker, and true insulin sensitivity measured by a gold-standard euglycaemic insulin clamp, and simulations, where the behavior of profile-likelihood-based confidence intervals is examined. MLE was shown to be a robust estimator for non-normal distributions and efficient for small sample situations. Copyright (c) 2008 John Wiley & Sons, Ltd.
Beyond Roughness: Maximum-Likelihood Estimation of Topographic "Structure" on Venus and Elsewhere in the Solar System

NASA Astrophysics Data System (ADS)

Simons, F. J.; Eggers, G. L.; Lewis, K. W.; Olhede, S. C.

2015-12-01

What numbers "capture" topography? If stationary, white, and Gaussian: mean and variance. But "whiteness" is strong; we are led to a "baseline" over which to compute means and variances. We then have subscribed to topography as a correlated process, and to the estimation (noisy, afftected by edge effects) of the parameters of a spatial or spectral covariance function. What if the covariance function or the point process itself aren't Gaussian? What if the region under study isn't regularly shaped or sampled? How can results from differently sized patches be compared robustly? We present a spectral-domain "Whittle" maximum-likelihood procedure that circumvents these difficulties and answers the above questions. The key is the Matern form, whose parameters (variance, range, differentiability) define the shape of the covariance function (Gaussian, exponential, ..., are all special cases). We treat edge effects in simulation and in estimation. Data tapering allows for the irregular regions. We determine the estimation variance of all parameters. And the "best" estimate may not be "good enough": we test whether the "model" itself warrants rejection. We illustrate our methodology on geologically mapped patches of Venus. Surprisingly few numbers capture planetary topography. We derive them, with uncertainty bounds, we simulate "new" realizations of patches that look to the geologists exactly as if they were derived from similar processes. Our approach holds in 1, 2, and 3 spatial dimensions, and generalizes to multiple variables, e.g. when topography and gravity are being considered jointly (perhaps linked by flexural rigidity, erosion, or other surface and sub-surface modifying processes). Our results have widespread implications for the study of planetary topography in the Solar System, and are interpreted in the light of trying to derive "process" from "parameters", the end goal to assign likely formation histories for the patches under consideration. Our results should also be relevant for whomever needed to perform spatial interpolation or out-of-sample extension (e.g. kriging), machine learning and feature detection, on geological data. We present procedural details but focus on high-level results that have real-world implications for the study of Venus, Earth, other planets, and moons.
Network Structure and Biased Variance Estimation in Respondent Driven Sampling

PubMed Central

Verdery, Ashton M.; Mouw, Ted; Bauldry, Shawn; Mucha, Peter J.

2015-01-01

This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network. PMID:26679927
2dFLenS and KiDS: determining source redshift distributions with cross-correlations

NASA Astrophysics Data System (ADS)

Johnson, Andrew; Blake, Chris; Amon, Alexandra; Erben, Thomas; Glazebrook, Karl; Harnois-Deraps, Joachim; Heymans, Catherine; Hildebrandt, Hendrik; Joudaki, Shahab; Klaes, Dominik; Kuijken, Konrad; Lidman, Chris; Marin, Felipe A.; McFarland, John; Morrison, Christopher B.; Parkinson, David; Poole, Gregory B.; Radovich, Mario; Wolf, Christian

2017-03-01

We develop a statistical estimator to infer the redshift probability distribution of a photometric sample of galaxies from its angular cross-correlation in redshift bins with an overlapping spectroscopic sample. This estimator is a minimum-variance weighted quadratic function of the data: a quadratic estimator. This extends and modifies the methodology presented by McQuinn & White. The derived source redshift distribution is degenerate with the source galaxy bias, which must be constrained via additional assumptions. We apply this estimator to constrain source galaxy redshift distributions in the Kilo-Degree imaging survey through cross-correlation with the spectroscopic 2-degree Field Lensing Survey, presenting results first as a binned step-wise distribution in the range z < 0.8, and then building a continuous distribution using a Gaussian process model. We demonstrate the robustness of our methodology using mock catalogues constructed from N-body simulations, and comparisons with other techniques for inferring the redshift distribution.
The developmental genetics of biological robustness

PubMed Central

Mestek Boukhibar, Lamia; Barkoulas, Michalis

2016-01-01

Background Living organisms are continuously confronted with perturbations, such as environmental changes that include fluctuations in temperature and nutrient availability, or genetic changes such as mutations. While some developmental systems are affected by such challenges and display variation in phenotypic traits, others continue consistently to produce invariable phenotypes despite perturbation. This ability of a living system to maintain an invariable phenotype in the face of perturbations is termed developmental robustness. Biological robustness is a phenomenon observed across phyla, and studying its mechanisms is central to deciphering the genotype–phenotype relationship. Recent work in yeast, animals and plants has shown that robustness is genetically controlled and has started to reveal the underlying mechinisms behind it. Scope and Conclusions Studying biological robustness involves focusing on an important property of developmental traits, which is the phenotypic distribution within a population. This is often neglected because the vast majority of developmental biology studies instead focus on population aggregates, such as trait averages. By drawing on findings in animals and yeast, this Viewpoint considers how studies on plant developmental robustness may benefit from strict definitions of what is the developmental system of choice and what is the relevant perturbation, and also from clear distinctions between gene effects on the trait mean and the trait variance. Recent advances in quantitative developmental biology and high-throughput phenotyping now allow the design of targeted genetic screens to identify genes that amplify or restrict developmental trait variance and to study how variation propagates across different phenotypic levels in biological systems. The molecular characterization of more quantitative trait loci affecting trait variance will provide further insights into the evolution of genes modulating developmental robustness. The study of robustness mechanisms in closely related species will address whether mechanisms of robustness are evolutionarily conserved. PMID:26292993
Space Object Maneuver Detection Algorithms Using TLE Data

NASA Astrophysics Data System (ADS)

Pittelkau, M.

2016-09-01

An important aspect of Space Situational Awareness (SSA) is detection of deliberate and accidental orbit changes of space objects. Although space surveillance systems detect orbit maneuvers within their tracking algorithms, maneuver data are not readily disseminated for general use. However, two-line element (TLE) data is available and can be used to detect maneuvers of space objects. This work is an attempt to improve upon existing TLE-based maneuver detection algorithms. Three adaptive maneuver detection algorithms are developed and evaluated: The first is a fading-memory Kalman filter, which is equivalent to the sliding-window least-squares polynomial fit, but computationally more efficient and adaptive to the noise in the TLE data. The second algorithm is based on a sample cumulative distribution function (CDF) computed from a histogram of the magnitude-squared |V|2 of change-in-velocity vectors (V), which is computed from the TLE data. A maneuver detection threshold is computed from the median estimated from the CDF, or from the CDF and a specified probability of false alarm. The third algorithm is a median filter. The median filter is the simplest of a class of nonlinear filters called order statistics filters, which is within the theory of robust statistics. The output of the median filter is practically insensitive to outliers, or large maneuvers. The median of the |V|2 data is proportional to the variance of the V, so the variance is estimated from the output of the median filter. A maneuver is detected when the input data exceeds a constant times the estimated variance.
Evaluation and design of a rain gauge network using a statistical optimization method in a severe hydro-geological hazard prone area

NASA Astrophysics Data System (ADS)

Fattoruso, Grazia; Longobardi, Antonia; Pizzuti, Alfredo; Molinara, Mario; Marocco, Claudio; De Vito, Saverio; Tortorella, Francesco; Di Francia, Girolamo

2017-06-01

Rainfall data collection gathered in continuous by a distributed rain gauge network is instrumental to more effective hydro-geological risk forecasting and management services though the input estimated rainfall fields suffer from prediction uncertainty. Optimal rain gauge networks can generate accurate estimated rainfall fields. In this research work, a methodology has been investigated for evaluating an optimal rain gauges network aimed at robust hydrogeological hazard investigations. The rain gauges of the Sarno River basin (Southern Italy) has been evaluated by optimizing a two-objective function that maximizes the estimated accuracy and minimizes the total metering cost through the variance reduction algorithm along with the climatological variogram (time-invariant). This problem has been solved by using an enumerative search algorithm, evaluating the exact Pareto-front by an efficient computational time.
Estimating the encounter rate variance in distance sampling

USGS Publications Warehouse

Fewster, R.M.; Buckland, S.T.; Burnham, K.P.; Borchers, D.L.; Jupp, P.E.; Laake, J.L.; Thomas, L.

2009-01-01

The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. ?? 2008, The International Biometric Society.
Just Noticeable Distortion Model and Its Application in Color Image Watermarking

NASA Astrophysics Data System (ADS)

Liu, Kuo-Cheng

In this paper, a perceptually adaptive watermarking scheme for color images is proposed in order to achieve robustness and transparency. A new just noticeable distortion (JND) estimator for color images is first designed in the wavelet domain. The key issue of the JND model is to effectively integrate visual masking effects. The estimator is an extension to the perceptual model that is used in image coding for grayscale images. Except for the visual masking effects given coefficient by coefficient by taking into account the luminance content and the texture of grayscale images, the crossed masking effect given by the interaction between luminance and chrominance components and the effect given by the variance within the local region of the target coefficient are investigated such that the visibility threshold for the human visual system (HVS) can be evaluated. In a locally adaptive fashion based on the wavelet decomposition, the estimator applies to all subbands of luminance and chrominance components of color images and is used to measure the visibility of wavelet quantization errors. The subband JND profiles are then incorporated into the proposed color image watermarking scheme. Performance in terms of robustness and transparency of the watermarking scheme is obtained by means of the proposed approach to embed the maximum strength watermark while maintaining the perceptually lossless quality of the watermarked color image. Simulation results show that the proposed scheme with inserting watermarks into luminance and chrominance components is more robust than the existing scheme while retaining the watermark transparency.
Sampling design considerations for demographic studies: a case of colonial seabirds

USGS Publications Warehouse

Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

2009-01-01

For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.
Robust Programming Problems Based on the Mean-Variance Model Including Uncertainty Factors

NASA Astrophysics Data System (ADS)

Hasuike, Takashi; Ishii, Hiroaki

2009-01-01

This paper considers robust programming problems based on the mean-variance model including uncertainty sets and fuzzy factors. Since these problems are not well-defined problems due to fuzzy factors, it is hard to solve them directly. Therefore, introducing chance constraints, fuzzy goals and possibility measures, the proposed models are transformed into the deterministic equivalent problems. Furthermore, in order to solve these equivalent problems efficiently, the solution method is constructed introducing the mean-absolute deviation and doing the equivalent transformations.

Efficiently estimating salmon escapement uncertainty using systematically sampled data

USGS Publications Warehouse

Reynolds, Joel H.; Woody, Carol Ann; Gove, Nancy E.; Fair, Lowell F.

2007-01-01

Fish escapement is generally monitored using nonreplicated systematic sampling designs (e.g., via visual counts from towers or hydroacoustic counts). These sampling designs support a variety of methods for estimating the variance of the total escapement. Unfortunately, all the methods give biased results, with the magnitude of the bias being determined by the underlying process patterns. Fish escapement commonly exhibits positive autocorrelation and nonlinear patterns, such as diurnal and seasonal patterns. For these patterns, poor choice of variance estimator can needlessly increase the uncertainty managers have to deal with in sustaining fish populations. We illustrate the effect of sampling design and variance estimator choice on variance estimates of total escapement for anadromous salmonids from systematic samples of fish passage. Using simulated tower counts of sockeye salmon Oncorhynchus nerka escapement on the Kvichak River, Alaska, five variance estimators for nonreplicated systematic samples were compared to determine the least biased. Using the least biased variance estimator, four confidence interval estimators were compared for expected coverage and mean interval width. Finally, five systematic sampling designs were compared to determine the design giving the smallest average variance estimate for total annual escapement. For nonreplicated systematic samples of fish escapement, all variance estimators were positively biased. Compared to the other estimators, the least biased estimator reduced bias by, on average, from 12% to 98%. All confidence intervals gave effectively identical results. Replicated systematic sampling designs consistently provided the smallest average estimated variance among those compared.
Application of Nitrogen and Carbon Stable Isotopes (δ15N and δ13C) to Quantify Food Chain Length and Trophic Structure

PubMed Central

Perkins, Matthew J.; McDonald, Robbie A.; van Veen, F. J. Frank; Kelly, Simon D.; Rees, Gareth; Bearhop, Stuart

2014-01-01

Increasingly, stable isotope ratios of nitrogen (δ15N) and carbon (δ13C) are used to quantify trophic structure, though relatively few studies have tested accuracy of isotopic structural measures. For laboratory-raised and wild-collected plant-invertebrate food chains spanning four trophic levels we estimated nitrogen range (NR) using δ15N, and carbon range (CR) using δ13C, which are used to quantify food chain length and breadth of trophic resources respectively. Across a range of known food chain lengths we examined how NR and CR changed within and between food chains. Our isotopic estimates of structure are robust because they were calculated using resampling procedures that propagate variance in sample means through to quantified uncertainty in final estimates. To identify origins of uncertainty in estimates of NR and CR, we additionally examined variation in discrimination (which is change in δ15N or δ13C from source to consumer) between trophic levels and among food chains. δ15N discrimination showed significant enrichment, while variation in enrichment was species and system specific, ranged broadly (1.4‰ to 3.3‰), and importantly, propagated variation to subsequent estimates of NR. However, NR proved robust to such variation and distinguished food chain length well, though some overlap between longer food chains infers a need for awareness of such limitations. δ13C discrimination was inconsistent; generally no change or small significant enrichment was observed. Consequently, estimates of CR changed little with increasing food chain length, showing the potential utility of δ13C as a tracer of energy pathways. This study serves as a robust test of isotopic quantification of food chain structure, and given global estimates of aquatic food chains approximate four trophic levels while many food chains include invertebrates, our use of four trophic level plant-invertebrate food chains makes our findings relevant for a majority of ecological systems. PMID:24676331
On the Interplay between the Evolvability and Network Robustness in an Evolutionary Biological Network: A Systems Biology Approach

PubMed Central

Chen, Bor-Sen; Lin, Ying-Po

2011-01-01

In the evolutionary process, the random transmission and mutation of genes provide biological diversities for natural selection. In order to preserve functional phenotypes between generations, gene networks need to evolve robustly under the influence of random perturbations. Therefore, the robustness of the phenotype, in the evolutionary process, exerts a selection force on gene networks to keep network functions. However, gene networks need to adjust, by variations in genetic content, to generate phenotypes for new challenges in the network’s evolution, ie, the evolvability. Hence, there should be some interplay between the evolvability and network robustness in evolutionary gene networks. In this study, the interplay between the evolvability and network robustness of a gene network and a biochemical network is discussed from a nonlinear stochastic system point of view. It was found that if the genetic robustness plus environmental robustness is less than the network robustness, the phenotype of the biological network is robust in evolution. The tradeoff between the genetic robustness and environmental robustness in evolution is discussed from the stochastic stability robustness and sensitivity of the nonlinear stochastic biological network, which may be relevant to the statistical tradeoff between bias and variance, the so-called bias/variance dilemma. Further, the tradeoff could be considered as an antagonistic pleiotropic action of a gene network and discussed from the systems biology perspective. PMID:22084563
Using average cost methods to estimate encounter-level costs for medical-surgical stays in the VA.

PubMed

Wagner, Todd H; Chen, Shuo; Barnett, Paul G

2003-09-01

The U.S. Department of Veterans Affairs (VA) maintains discharge abstracts, but these do not include cost information. This article describes the methods the authors used to estimate the costs of VA medical-surgical hospitalizations in fiscal years 1998 to 2000. They estimated a cost regression with 1996 Medicare data restricted to veterans receiving VA care in an earlier year. The regression accounted for approximately 74 percent of the variance in cost-adjusted charges, and it proved to be robust to outliers and the year of input data. The beta coefficients from the cost regression were used to impute costs of VA medical-surgical hospital discharges. The estimated aggregate costs were reconciled with VA budget allocations. In addition to the direct medical costs, their cost estimates include indirect costs and physician services; both of these were allocated in proportion to direct costs. They discuss the method's limitations and application in other health care systems.
The Form, and Some Robustness Properties of Integrated Distance Estimators for Linear Models, Applied to Some Published Data Sets.

DTIC Science & Technology

1982-06-01

observation in our framework is the pair (y,x) with x considered given. The influence function for 52 at the Gaussian distribution with mean xB and variance...3/2 - (1+22)o2 2) 1+2x\\/2 x’) 2(3-9) (1+2X) This influence function is bounded in the residual y-xS, and redescends to an asymptote greater than...version of the influence function for B at the Gaussian distribution, given the x. and x, is defined as the normalized differenceJ (see Barnett and
A Variance Distribution Model of Surface EMG Signals Based on Inverse Gamma Distribution.

PubMed

Hayashi, Hideaki; Furui, Akira; Kurita, Yuichi; Tsuji, Toshio

2017-11-01

Objective: This paper describes the formulation of a surface electromyogram (EMG) model capable of representing the variance distribution of EMG signals. Methods: In the model, EMG signals are handled based on a Gaussian white noise process with a mean of zero for each variance value. EMG signal variance is taken as a random variable that follows inverse gamma distribution, allowing the representation of noise superimposed onto this variance. Variance distribution estimation based on marginal likelihood maximization is also outlined in this paper. The procedure can be approximated using rectified and smoothed EMG signals, thereby allowing the determination of distribution parameters in real time at low computational cost. Results: A simulation experiment was performed to evaluate the accuracy of distribution estimation using artificially generated EMG signals, with results demonstrating that the proposed model's accuracy is higher than that of maximum-likelihood-based estimation. Analysis of variance distribution using real EMG data also suggested a relationship between variance distribution and signal-dependent noise. Conclusion: The study reported here was conducted to examine the performance of a proposed surface EMG model capable of representing variance distribution and a related distribution parameter estimation method. Experiments using artificial and real EMG data demonstrated the validity of the model. Significance: Variance distribution estimated using the proposed model exhibits potential in the estimation of muscle force. Objective: This paper describes the formulation of a surface electromyogram (EMG) model capable of representing the variance distribution of EMG signals. Methods: In the model, EMG signals are handled based on a Gaussian white noise process with a mean of zero for each variance value. EMG signal variance is taken as a random variable that follows inverse gamma distribution, allowing the representation of noise superimposed onto this variance. Variance distribution estimation based on marginal likelihood maximization is also outlined in this paper. The procedure can be approximated using rectified and smoothed EMG signals, thereby allowing the determination of distribution parameters in real time at low computational cost. Results: A simulation experiment was performed to evaluate the accuracy of distribution estimation using artificially generated EMG signals, with results demonstrating that the proposed model's accuracy is higher than that of maximum-likelihood-based estimation. Analysis of variance distribution using real EMG data also suggested a relationship between variance distribution and signal-dependent noise. Conclusion: The study reported here was conducted to examine the performance of a proposed surface EMG model capable of representing variance distribution and a related distribution parameter estimation method. Experiments using artificial and real EMG data demonstrated the validity of the model. Significance: Variance distribution estimated using the proposed model exhibits potential in the estimation of muscle force.
The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences.

PubMed

Hedge, Craig; Powell, Georgina; Sumner, Petroc

2018-06-01

Individual differences in cognitive paradigms are increasingly employed to relate cognition to brain structure, chemistry, and function. However, such efforts are often unfruitful, even with the most well established tasks. Here we offer an explanation for failures in the application of robust cognitive paradigms to the study of individual differences. Experimental effects become well established - and thus those tasks become popular - when between-subject variability is low. However, low between-subject variability causes low reliability for individual differences, destroying replicable correlations with other factors and potentially undermining published conclusions drawn from correlational relationships. Though these statistical issues have a long history in psychology, they are widely overlooked in cognitive psychology and neuroscience today. In three studies, we assessed test-retest reliability of seven classic tasks: Eriksen Flanker, Stroop, stop-signal, go/no-go, Posner cueing, Navon, and Spatial-Numerical Association of Response Code (SNARC). Reliabilities ranged from 0 to .82, being surprisingly low for most tasks given their common use. As we predicted, this emerged from low variance between individuals rather than high measurement variance. In other words, the very reason such tasks produce robust and easily replicable experimental effects - low between-participant variability - makes their use as correlational tools problematic. We demonstrate that taking such reliability estimates into account has the potential to qualitatively change theoretical conclusions. The implications of our findings are that well-established approaches in experimental psychology and neuropsychology may not directly translate to the study of individual differences in brain structure, chemistry, and function, and alternative metrics may be required.
Modelling heterogeneity variances in multiple treatment comparison meta-analysis--are informative priors the better solution?

PubMed

Thorlund, Kristian; Thabane, Lehana; Mills, Edward J

2013-01-11

Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the 'common variance' assumption). This approach 'borrows strength' for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary. In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities. In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach. MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice.
Analytic variance estimates of Swank and Fano factors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gutierrez, Benjamin; Badano, Aldo; Samuelson, Frank, E-mail: frank.samuelson@fda.hhs.gov

Purpose: Variance estimates for detector energy resolution metrics can be used as stopping criteria in Monte Carlo simulations for the purpose of ensuring a small uncertainty of those metrics and for the design of variance reduction techniques. Methods: The authors derive an estimate for the variance of two energy resolution metrics, the Swank factor and the Fano factor, in terms of statistical moments that can be accumulated without significant computational overhead. The authors examine the accuracy of these two estimators and demonstrate how the estimates of the coefficient of variation of the Swank and Fano factors behave with data frommore » a Monte Carlo simulation of an indirect x-ray imaging detector. Results: The authors' analyses suggest that the accuracy of their variance estimators is appropriate for estimating the actual variances of the Swank and Fano factors for a variety of distributions of detector outputs. Conclusions: The variance estimators derived in this work provide a computationally convenient way to estimate the error or coefficient of variation of the Swank and Fano factors during Monte Carlo simulations of radiation imaging systems.« less
Consequences of Assumption Violations Revisited: A Quantitative Review of Alternatives to the One-Way Analysis of Variance "F" Test.

ERIC Educational Resources Information Center

Lix, Lisa M.; And Others

1996-01-01

Meta-analytic techniques were used to summarize the statistical robustness literature on Type I error properties of alternatives to the one-way analysis of variance "F" test. The James (1951) and Welch (1951) tests performed best under violations of the variance homogeneity assumption, although their use is not always appropriate. (SLD)
Multifactorial inheritance with cultural transmission and assortative mating. II. a general model of combined polygenic and cultural inheritance.

PubMed Central

Cloninger, C R; Rice, J; Reich, T

1979-01-01

A general linear model of combined polygenic-cultural inheritance is described. The model allows for phenotypic assortative mating, common environment, maternal and paternal effects, and genic-cultural correlation. General formulae for phenotypic correlation between family members in extended pedigrees are given for both primary and secondary assortative mating. A FORTRAN program BETA, available upon request, is used to provide maximum likelihood estimates of the parameters from reported correlations. American data about IQ and Burks' culture index are analyzed. Both cultural and genetic components of phenotypic variance are observed to make significant and substantial contributions to familial resemblance in IQ. The correlation between the environments of DZ twins is found to equal that of singleton sibs, not that of MZ twins. Burks' culture index is found to be an imperfect measure of midparent IQ rather than an index of home environment as previously assumed. Conditions under which the parameters of the model may be uniquely and precisely estimated are discussed. Interpretation of variance components in the presence of assortative mating and genic-cultural covariance is reviewed. A conservative, but robust, approach to the use of environmental indices is described. PMID:453202
A new framework for comprehensive, robust, and efficient global sensitivity analysis: 2. Application

NASA Astrophysics Data System (ADS)

Razavi, Saman; Gupta, Hoshin V.

2016-01-01

Based on the theoretical framework for sensitivity analysis called "Variogram Analysis of Response Surfaces" (VARS), developed in the companion paper, we develop and implement a practical "star-based" sampling strategy (called STAR-VARS), for the application of VARS to real-world problems. We also develop a bootstrap approach to provide confidence level estimates for the VARS sensitivity metrics and to evaluate the reliability of inferred factor rankings. The effectiveness, efficiency, and robustness of STAR-VARS are demonstrated via two real-data hydrological case studies (a 5-parameter conceptual rainfall-runoff model and a 45-parameter land surface scheme hydrology model), and a comparison with the "derivative-based" Morris and "variance-based" Sobol approaches are provided. Our results show that STAR-VARS provides reliable and stable assessments of "global" sensitivity across the full range of scales in the factor space, while being 1-2 orders of magnitude more efficient than the Morris or Sobol approaches.
Comparing Mapped Plot Estimators

Treesearch

Paul C. Van Deusen

2006-01-01

Two alternative derivations of estimators for mean and variance from mapped plots are compared by considering the models that support the estimators and by simulation. It turns out that both models lead to the same estimator for the mean but lead to very different variance estimators. The variance estimators based on the least valid model assumptions are shown to...
Robust Multi-Frame Adaptive Optics Image Restoration Algorithm Using Maximum Likelihood Estimation with Poisson Statistics.

PubMed

Li, Dongming; Sun, Changming; Yang, Jinhua; Liu, Huan; Peng, Jiaqi; Zhang, Lijuan

2017-04-06

An adaptive optics (AO) system provides real-time compensation for atmospheric turbulence. However, an AO image is usually of poor contrast because of the nature of the imaging process, meaning that the image contains information coming from both out-of-focus and in-focus planes of the object, which also brings about a loss in quality. In this paper, we present a robust multi-frame adaptive optics image restoration algorithm via maximum likelihood estimation. Our proposed algorithm uses a maximum likelihood method with image regularization as the basic principle, and constructs the joint log likelihood function for multi-frame AO images based on a Poisson distribution model. To begin with, a frame selection method based on image variance is applied to the observed multi-frame AO images to select images with better quality to improve the convergence of a blind deconvolution algorithm. Then, by combining the imaging conditions and the AO system properties, a point spread function estimation model is built. Finally, we develop our iterative solutions for AO image restoration addressing the joint deconvolution issue. We conduct a number of experiments to evaluate the performances of our proposed algorithm. Experimental results show that our algorithm produces accurate AO image restoration results and outperforms the current state-of-the-art blind deconvolution methods.
Robust Multi-Frame Adaptive Optics Image Restoration Algorithm Using Maximum Likelihood Estimation with Poisson Statistics

PubMed Central

Li, Dongming; Sun, Changming; Yang, Jinhua; Liu, Huan; Peng, Jiaqi; Zhang, Lijuan

2017-01-01

An adaptive optics (AO) system provides real-time compensation for atmospheric turbulence. However, an AO image is usually of poor contrast because of the nature of the imaging process, meaning that the image contains information coming from both out-of-focus and in-focus planes of the object, which also brings about a loss in quality. In this paper, we present a robust multi-frame adaptive optics image restoration algorithm via maximum likelihood estimation. Our proposed algorithm uses a maximum likelihood method with image regularization as the basic principle, and constructs the joint log likelihood function for multi-frame AO images based on a Poisson distribution model. To begin with, a frame selection method based on image variance is applied to the observed multi-frame AO images to select images with better quality to improve the convergence of a blind deconvolution algorithm. Then, by combining the imaging conditions and the AO system properties, a point spread function estimation model is built. Finally, we develop our iterative solutions for AO image restoration addressing the joint deconvolution issue. We conduct a number of experiments to evaluate the performances of our proposed algorithm. Experimental results show that our algorithm produces accurate AO image restoration results and outperforms the current state-of-the-art blind deconvolution methods. PMID:28383503
Robust fuzzy control subject to state variance and passivity constraints for perturbed nonlinear systems with multiplicative noises.

PubMed

Chang, Wen-Jer; Huang, Bo-Jyun

2014-11-01

The multi-constrained robust fuzzy control problem is investigated in this paper for perturbed continuous-time nonlinear stochastic systems. The nonlinear system considered in this paper is represented by a Takagi-Sugeno fuzzy model with perturbations and state multiplicative noises. The multiple performance constraints considered in this paper include stability, passivity and individual state variance constraints. The Lyapunov stability theory is employed to derive sufficient conditions to achieve the above performance constraints. By solving these sufficient conditions, the contribution of this paper is to develop a parallel distributed compensation based robust fuzzy control approach to satisfy multiple performance constraints for perturbed nonlinear systems with multiplicative noises. At last, a numerical example for the control of perturbed inverted pendulum system is provided to illustrate the applicability and effectiveness of the proposed multi-constrained robust fuzzy control method. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
No brain expansion in Australopithecus boisei.

PubMed

Hawks, John

2011-10-01

The endocranial volumes of robust australopithecine fossils appear to have increased in size over time. Most evidence with temporal resolution is concentrated in East African Australopithecus boisei. Including the KNM-WT 17000 cranium, this sample comprises 11 endocranial volume estimates ranging in date from 2.5 million to 1.4 million years ago. But the sample presents several difficulties to a test of trend, including substantial estimation error for some specimens and an unusually low variance. This study reevaluates the evidence, using randomization methods and a related test using an explicit model of variability. None of these tests applied to the A. boisei endocranial volume sample produces significant evidence for a trend in that species, whether or not the early KNM-WT 17000 specimen is included. Copyright © 2011 Wiley-Liss, Inc.
Robust point matching via vector field consensus.

PubMed

Jiayi Ma; Ji Zhao; Jinwen Tian; Yuille, Alan L; Zhuowen Tu

2014-04-01

In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points. Our algorithm starts by creating a set of putative correspondences which can contain a very large number of false correspondences, or outliers, in addition to a limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector field between the two point sets, which involves estimating a consensus of inlier points whose matching follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of outliers (even up to 90%). We also show that in the special case where there is an underlying parametric geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for others to use it. In addition, our approach is general and can be applied to other problems, such as learning with a badly corrupted training data set.
Comparing the Performance of Approaches for Testing the Homogeneity of Variance Assumption in One-Factor ANOVA Models

ERIC Educational Resources Information Center

Wang, Yan; Rodríguez de Gil, Patricia; Chen, Yi-Hsin; Kromrey, Jeffrey D.; Kim, Eun Sook; Pham, Thanh; Nguyen, Diep; Romano, Jeanine L.

2017-01-01

Various tests to check the homogeneity of variance assumption have been proposed in the literature, yet there is no consensus as to their robustness when the assumption of normality does not hold. This simulation study evaluated the performance of 14 tests for the homogeneity of variance assumption in one-way ANOVA models in terms of Type I error…
The "universal" behavior of the Breakthrough Curve in 3D aquifer transport and the validity of the First-Order solution

NASA Astrophysics Data System (ADS)

Jankovic, Igor; Maghrebi, Mahdi; Fiori, Aldo; Zarlenga, Antonio; Dagan, Gedeon

2017-04-01

We examine the impact of permeability structures on the Breakthrough Curve (BTC) of solute, at a distance x from the injection plane, under mean uniform flow of mean velocity U. The study is carried out through accurate 3D numerical simulations, rather than the 2D models adopted in most of previous works. All structures share the same univariate distribution of the logconductivity Y = lnK and autocorrelation function ρY , but differ in higher order statistics. The main finding is that the BTC of ergodic plumes for the different examined structures is quite robust, displaying a seemingly "universal" behavior. The result is in variance with similar analyses carried out in the past for 2D permeability structures. The basic parameters (i.e. the geometric mean, the logconductivity variance σY 2 and the horizontal integral scale I) have to be identified from field data (e.g. core analysis, pumping test or other methods). However, prediction requires the knowledge of U, and the results suggest that improvement of the BTC prediction in applications can be achieved by independent estimates of the mean velocity U, e.g. by pumping tests, rather than attempting to characterize the permeability structure beyond its second-order characterization. The BTC prediction made by the Inverse Gaussian (IG) distribution, adopting the macrodispersion coefficient estimated by the First Order approximation αL = σY 2I, is also quite robust, providing a simple and effective solution to be employed in applications. The consequences of the latter result are further explored by modeling the mass distribution that occurred at the MADE-1 natural gradient experiment, for which we show that most of the plume features are adequately captured by the simple First Order approach.

Fusion of P300 and eye-tracker data for spelling using BCI2000

NASA Astrophysics Data System (ADS)

Kalika, Dmitry; Collins, Leslie; Caves, Kevin; Throckmorton, Chandra

2017-10-01

Objective. Various augmentative and alternative communication (AAC) devices have been developed in order to aid communication for individuals with communication disorders. Recently, there has been interest in combining EEG data and eye-gaze data with the goal of developing a hybrid (or ‘fused’) BCI (hBCI) AAC system. This work explores the effectiveness of a speller that fuses data from an eye-tracker and the P300 speller in order to create a hybrid P300 speller. Approach. This hybrid speller collects both eye-tracking and EEG data in parallel, and the user spells characters on the screen in the same way that they would if they were only using the P300 speller. Online and offline experiments were performed. The online experiments measured the performance of the speller for sixteen non-disabled participants, while the offline simulations were used to assess the robustness of the hybrid system. Main results. Online results showed that for fifteen non-disabled participants, using eye-gaze in a Bayesian framework with EEG data from the P300 speller improved accuracy (0.0163+/- 2.72 , 0.085+/- 0.111 , 0.080+/- 0.106 for estimated, medium and high variance configurations) and reduced the average number of flashes required to spell a character compared to the standard P300 speller that relies solely on EEG data (-53.27+/- 25.87 , -36.15+/- 19.3 , -18.85+/- 12.43 for estimated, medium and high variance configurations). Offline simulations indicate that the system provides more robust performance than a standalone eye gaze system. Significance. The results of this work on non-disabled participants shows the potential efficacy of hybrid P300 and eye-tracker speller. Further validation on the amyotrophic lateral sceloris population is needed to assess the benefit of this hybrid system.
Robust LOD scores for variance component-based linkage analysis.

PubMed

Blangero, J; Williams, J T; Almasy, L

2000-01-01

The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.
Modelling heterogeneity variances in multiple treatment comparison meta-analysis – Are informative priors the better solution?

PubMed Central

2013-01-01

Background Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the ‘common variance’ assumption). This approach ‘borrows strength’ for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary. Methods In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities. Results In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach. Conclusions MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice. PMID:23311298
Planning spatial sampling of the soil from an uncertain reconnaissance variogram

NASA Astrophysics Data System (ADS)

Lark, R. Murray; Hamilton, Elliott M.; Kaninga, Belinda; Maseka, Kakoma K.; Mutondo, Moola; Sakala, Godfrey M.; Watts, Michael J.

2017-12-01

An estimated variogram of a soil property can be used to support a rational choice of sampling intensity for geostatistical mapping. However, it is known that estimated variograms are subject to uncertainty. In this paper we address two practical questions. First, how can we make a robust decision on sampling intensity, given the uncertainty in the variogram? Second, what are the costs incurred in terms of oversampling because of uncertainty in the variogram model used to plan sampling? To achieve this we show how samples of the posterior distribution of variogram parameters, from a computational Bayesian analysis, can be used to characterize the effects of variogram parameter uncertainty on sampling decisions. We show how one can select a sample intensity so that a target value of the kriging variance is not exceeded with some specified probability. This will lead to oversampling, relative to the sampling intensity that would be specified if there were no uncertainty in the variogram parameters. One can estimate the magnitude of this oversampling by treating the tolerable grid spacing for the final sample as a random variable, given the target kriging variance and the posterior sample values. We illustrate these concepts with some data on total uranium content in a relatively sparse sample of soil from agricultural land near mine tailings in the Copperbelt Province of Zambia.
Minimum mean squared error (MSE) adjustment and the optimal Tykhonov-Phillips regularization parameter via reproducing best invariant quadratic uniformly unbiased estimates (repro-BIQUUE)

NASA Astrophysics Data System (ADS)

Schaffrin, Burkhard

2008-02-01

In a linear Gauss-Markov model, the parameter estimates from BLUUE (Best Linear Uniformly Unbiased Estimate) are not robust against possible outliers in the observations. Moreover, by giving up the unbiasedness constraint, the mean squared error (MSE) risk may be further reduced, in particular when the problem is ill-posed. In this paper, the α-weighted S-homBLE (Best homogeneously Linear Estimate) is derived via formulas originally used for variance component estimation on the basis of the repro-BIQUUE (reproducing Best Invariant Quadratic Uniformly Unbiased Estimate) principle in a model with stochastic prior information. In the present model, however, such prior information is not included, which allows the comparison of the stochastic approach (α-weighted S-homBLE) with the well-established algebraic approach of Tykhonov-Phillips regularization, also known as R-HAPS (Hybrid APproximation Solution), whenever the inverse of the “substitute matrix” S exists and is chosen as the R matrix that defines the relative impact of the regularizing term on the final result.
Radiance and atmosphere propagation-based method for the target range estimation

NASA Astrophysics Data System (ADS)

Cho, Hoonkyung; Chun, Joohwan

2012-06-01

Target range estimation is traditionally based on radar and active sonar systems in modern combat system. However, the performance of such active sensor devices is degraded tremendously by jamming signal from the enemy. This paper proposes a simple range estimation method between the target and the sensor. Passive IR sensors measures infrared (IR) light radiance radiating from objects in dierent wavelength and this method shows robustness against electromagnetic jamming. The measured target radiance of each wavelength at the IR sensor depends on the emissive properties of target material and is attenuated by various factors, in particular the distance between the sensor and the target and atmosphere environment. MODTRAN is a tool that models atmospheric propagation of electromagnetic radiation. Based on the result from MODTRAN and measured radiance, the target range is estimated. To statistically analyze the performance of proposed method, we use maximum likelihood estimation (MLE) and evaluate the Cramer-Rao Lower Bound (CRLB) via the probability density function of measured radiance. And we also compare CRLB and the variance of and ML estimation using Monte-Carlo.
Variable selection for confounder control, flexible modeling and Collaborative Targeted Minimum Loss-based Estimation in causal inference

PubMed Central

Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan

2015-01-01

This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129
Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference.

PubMed

Schnitzer, Mireille E; Lok, Judith J; Gruber, Susan

2016-05-01

This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010 [27]) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low- and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios.
Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic

PubMed Central

Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K.; Strug, Lisa J.

2014-01-01

Motivation: Sufficiently powered case–control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. Results: We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the ‘gold standard’ analysis with the true underlying genotypes for both common and rare variants. Availability and implementation: An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. Contact: lisa.strug@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24733292
Secure Fusion Estimation for Bandwidth Constrained Cyber-Physical Systems Under Replay Attacks.

PubMed

Chen, Bo; Ho, Daniel W C; Hu, Guoqiang; Yu, Li; Bo Chen; Ho, Daniel W C; Guoqiang Hu; Li Yu; Chen, Bo; Ho, Daniel W C; Hu, Guoqiang; Yu, Li

2018-06-01

State estimation plays an essential role in the monitoring and supervision of cyber-physical systems (CPSs), and its importance has made the security and estimation performance a major concern. In this case, multisensor information fusion estimation (MIFE) provides an attractive alternative to study secure estimation problems because MIFE can potentially improve estimation accuracy and enhance reliability and robustness against attacks. From the perspective of the defender, the secure distributed Kalman fusion estimation problem is investigated in this paper for a class of CPSs under replay attacks, where each local estimate obtained by the sink node is transmitted to a remote fusion center through bandwidth constrained communication channels. A new mathematical model with compensation strategy is proposed to characterize the replay attacks and bandwidth constrains, and then a recursive distributed Kalman fusion estimator (DKFE) is designed in the linear minimum variance sense. According to different communication frameworks, two classes of data compression and compensation algorithms are developed such that the DKFEs can achieve the desired performance. Several attack-dependent and bandwidth-dependent conditions are derived such that the DKFEs are secure under replay attacks. An illustrative example is given to demonstrate the effectiveness of the proposed methods.
Direct and indirect genetic and fine-scale location effects on breeding date in song sparrows.

PubMed

Germain, Ryan R; Wolak, Matthew E; Arcese, Peter; Losdat, Sylvain; Reid, Jane M

2016-11-01

Quantifying direct and indirect genetic effects of interacting females and males on variation in jointly expressed life-history traits is central to predicting microevolutionary dynamics. However, accurately estimating sex-specific additive genetic variances in such traits remains difficult in wild populations, especially if related individuals inhabit similar fine-scale environments. Breeding date is a key life-history trait that responds to environmental phenology and mediates individual and population responses to environmental change. However, no studies have estimated female (direct) and male (indirect) additive genetic and inbreeding effects on breeding date, and estimated the cross-sex genetic correlation, while simultaneously accounting for fine-scale environmental effects of breeding locations, impeding prediction of microevolutionary dynamics. We fitted animal models to 38 years of song sparrow (Melospiza melodia) phenology and pedigree data to estimate sex-specific additive genetic variances in breeding date, and the cross-sex genetic correlation, thereby estimating the total additive genetic variance while simultaneously estimating sex-specific inbreeding depression. We further fitted three forms of spatial animal model to explicitly estimate variance in breeding date attributable to breeding location, overlap among breeding locations and spatial autocorrelation. We thereby quantified fine-scale location variances in breeding date and quantified the degree to which estimating such variances affected the estimated additive genetic variances. The non-spatial animal model estimated nonzero female and male additive genetic variances in breeding date (sex-specific heritabilities: 0·07 and 0·02, respectively) and a strong, positive cross-sex genetic correlation (0·99), creating substantial total additive genetic variance (0·18). Breeding date varied with female, but not male inbreeding coefficient, revealing direct, but not indirect, inbreeding depression. All three spatial animal models estimated small location variance in breeding date, but because relatedness and breeding location were virtually uncorrelated, modelling location variance did not alter the estimated additive genetic variances. Our results show that sex-specific additive genetic effects on breeding date can be strongly positively correlated, which would affect any predicted rates of microevolutionary change in response to sexually antagonistic or congruent selection. Further, we show that inbreeding effects on breeding date can also be sex specific and that genetic effects can exceed phenotypic variation stemming from fine-scale location-based variation within a wild population. © 2016 The Authors. Journal of Animal Ecology © 2016 British Ecological Society.
Comparing estimates of genetic variance across different relationship models.

PubMed

Legarra, Andres

2016-02-01

Use of relationships between individuals to estimate genetic variances and heritabilities via mixed models is standard practice in human, plant and livestock genetics. Different models or information for relationships may give different estimates of genetic variances. However, comparing these estimates across different relationship models is not straightforward as the implied base populations differ between relationship models. In this work, I present a method to compare estimates of variance components across different relationship models. I suggest referring genetic variances obtained using different relationship models to the same reference population, usually a set of individuals in the population. Expected genetic variance of this population is the estimated variance component from the mixed model times a statistic, Dk, which is the average self-relationship minus the average (self- and across-) relationship. For most typical models of relationships, Dk is close to 1. However, this is not true for very deep pedigrees, for identity-by-state relationships, or for non-parametric kernels, which tend to overestimate the genetic variance and the heritability. Using mice data, I show that heritabilities from identity-by-state and kernel-based relationships are overestimated. Weighting these estimates by Dk scales them to a base comparable to genomic or pedigree relationships, avoiding wrong comparisons, for instance, "missing heritabilities". Copyright © 2015 Elsevier Inc. All rights reserved.
Intercomparison of methods for the estimation of displacement height and roughness length from single-level eddy covariance data

NASA Astrophysics Data System (ADS)

Graf, Alexander; van de Boer, Anneke; Schüttemeyer, Dirk; Moene, Arnold; Vereecken, Harry

2013-04-01

The displacement height d and roughness length z0 are parameters of the logarithmic wind profile and as such these are characteristics of the surface, that are required in a multitude of meteorological modeling applications. Classically, both parameters are estimated from multi-level measurements of wind speed over a terrain sufficiently homogeneous to avoid footprint-induced differences between the levels. As a rule-of thumb, d of a dense, uniform crop or forest canopy is 2/3 to 3/4 of the canopy height h, and z0 about 10% of canopy height in absence of any d. However, the uncertainty of this rule-of-thumb becomes larger if the surface of interest is not "dense and uniform", in which case a site-specific determination is required again. By means of the eddy covariance method, alternative possibilities to determine z0 and d have become available. Various authors report robust results if either several levels of sonic anemometer measurements, or one such level combined with a classic wind profile is used to introduce direct knowledge on the friction velocity into the estimation procedure. At the same time, however, the eddy covariance method to measure various fluxes has superseded the profile method, leaving many current stations without a wind speed profile with enough levels sufficiently far above the canopy to enable the classic estimation of z0 and d. From single-level eddy covariance measurements at one point in time, only one parameter can be estimated, usually z0 while d is assumed to be known. Even so, results tend to scatter considerably. However, it has been pointed out, that the use of multiple points in time providing different stability conditions can enable the estimation of both parameters, if they are assumed constant over the time period regarded. These methods either rely on flux-variance similarity (Weaver 1990 and others following), or on the integrated universal function for momentum (Martano 2000 and others following). In both cases, iterations over the range of possible d values are necessary. We extended this set of methods by a non-iterative, regression based approach. Only a stability range of data is used in which the universal function is known to be approximately linear. Then, various types of multiple linear regression can be used to relate the terms of the logarithmic wind profile equation to each other, and derive z0 and d from the regression parameters. Two examples each of the two existing iterative approaches, and the new non-iterative one are compared to each other and to plausibility limits in three different agricultural crops. The study contains periods of growth as well as of constant crop height, also allowing for an examination of the relations between z0, d, and canopy height. Results indicate that estimated z0 values, even in absence of prescribed d values, are fairly robust, plausible and consistent across all methods. The largest deviations are produced by the two flux-variance similarity based methods. Estimates of d, in contrast, can be subject to implausible deviations with all methods, even after quality-filtering of input data. Again, the largest deviations occur with flux-variance similarity based methods. Ensemble averaging between all methods can reduce this problem, offering a potentially useful way of estimating d at more complex sites where the rule-of-thumb cannot be applied easily. Martano P (2000): Estimation of surface roughness length and displacement height from single-level sonic anemometer data. Journal of Applied Meteorology 39:708-715. Weaver HL (1990): Temperature and Humidity flux-variance relations determined by one-dimensional eddy correlation. Boundary-Layer Meteorology 53:77-91.
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

PubMed Central

Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

2009-01-01

Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Magnetic resonance image restoration via dictionary learning under spatially adaptive constraints.

PubMed

Wang, Shanshan; Xia, Yong; Dong, Pei; Feng, David Dagan; Luo, Jianhua; Huang, Qiu

2013-01-01

This paper proposes a spatially adaptive constrained dictionary learning (SAC-DL) algorithm for Rician noise removal in magnitude magnetic resonance (MR) images. This algorithm explores both the strength of dictionary learning to preserve image structures and the robustness of local variance estimation to remove signal-dependent Rician noise. The magnitude image is first separated into a number of partly overlapping image patches. The statistics of each patch are collected and analyzed to obtain a local noise variance. To better adapt to Rician noise, a correction factor is formulated with the local signal-to-noise ratio (SNR). Finally, the trained dictionary is used to denoise each image patch under spatially adaptive constraints. The proposed algorithm has been compared to the popular nonlocal means (NLM) filtering and unbiased NLM (UNLM) algorithm on simulated T1-weighted, T2-weighted and PD-weighted MR images. Our results suggest that the SAC-DL algorithm preserves more image structures while effectively removing the noise than NLM and it is also superior to UNLM at low noise levels.
Robust Likelihoods for Inflationary Gravitational Waves from Maps of Cosmic Microwave Background Polarization

NASA Technical Reports Server (NTRS)

Switzer, Eric Ryan; Watts, Duncan J.

2016-01-01

The B-mode polarization of the cosmic microwave background provides a unique window into tensor perturbations from inflationary gravitational waves. Survey effects complicate the estimation and description of the power spectrum on the largest angular scales. The pixel-space likelihood yields parameter distributions without the power spectrum as an intermediate step, but it does not have the large suite of tests available to power spectral methods. Searches for primordial B-modes must rigorously reject and rule out contamination. Many forms of contamination vary or are uncorrelated across epochs, frequencies, surveys, or other data treatment subsets. The cross power and the power spectrum of the difference of subset maps provide approaches to reject and isolate excess variance. We develop an analogous joint pixel-space likelihood. Contamination not modeled in the likelihood produces parameter-dependent bias and complicates the interpretation of the difference map. We describe a null test that consistently weights the difference map. Excess variance should either be explicitly modeled in the covariance or be removed through reprocessing the data.
Jackknife Estimation of Sampling Variance of Ratio Estimators in Complex Samples: Bias and the Coefficient of Variation. Research Report. ETS RR-06-19

ERIC Educational Resources Information Center

Oranje, Andreas

2006-01-01

A multitude of methods has been proposed to estimate the sampling variance of ratio estimates in complex samples (Wolter, 1985). Hansen and Tepping (1985) studied some of those variance estimators and found that a high coefficient of variation (CV) of the denominator of a ratio estimate is indicative of a biased estimate of the standard error of a…
Estimation of genetic parameters and their sampling variances of quantitative traits in the type 2 modified augmented design

USDA-ARS?s Scientific Manuscript database

We proposed a method to estimate the error variance among non-replicated genotypes, thus to estimate the genetic parameters by using replicated controls. We derived formulas to estimate sampling variances of the genetic parameters. Computer simulation indicated that the proposed methods of estimatin...
Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST)

PubMed Central

Xu, Chonggang; Gertner, George

2013-01-01

Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. PMID:24143037
Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST).

PubMed

Xu, Chonggang; Gertner, George

2011-01-01

Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements.

Variable-period surface-wave magnitudes: A rapid and robust estimator of seismic moments

USGS Publications Warehouse

Bonner, J.; Herrmann, R.; Benz, H.

2010-01-01

We demonstrate that surface-wave magnitudes (Ms), measured at local, regional, and teleseismic distances, can be used as a rapid and robust estimator of seismic moment magnitude (Mw). We used the Russell (2006) variable-period surface-wave magnitude formula, henceforth called Ms(VMAX), to estimate the Ms for 165 North American events with 3.2
Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks

PubMed Central

Lam, William H. K.; Li, Qingquan

2017-01-01

Travel times in congested urban road networks are highly stochastic. Provision of travel time distribution information, including both mean and variance, can be very useful for travelers to make reliable path choice decisions to ensure higher probability of on-time arrival. To this end, a heterogeneous data fusion method is proposed to estimate travel time distributions by fusing heterogeneous data from point and interval detectors. In the proposed method, link travel time distributions are first estimated from point detector observations. The travel time distributions of links without point detectors are imputed based on their spatial correlations with links that have point detectors. The estimated link travel time distributions are then fused with path travel time distributions obtained from the interval detectors using Dempster-Shafer evidence theory. Based on fused path travel time distribution, an optimization technique is further introduced to update link travel time distributions and their spatial correlations. A case study was performed using real-world data from Hong Kong and showed that the proposed method obtained accurate and robust estimations of link and path travel time distributions in congested road networks. PMID:29210978
Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks.

PubMed

Shi, Chaoyang; Chen, Bi Yu; Lam, William H K; Li, Qingquan

2017-12-06

Travel times in congested urban road networks are highly stochastic. Provision of travel time distribution information, including both mean and variance, can be very useful for travelers to make reliable path choice decisions to ensure higher probability of on-time arrival. To this end, a heterogeneous data fusion method is proposed to estimate travel time distributions by fusing heterogeneous data from point and interval detectors. In the proposed method, link travel time distributions are first estimated from point detector observations. The travel time distributions of links without point detectors are imputed based on their spatial correlations with links that have point detectors. The estimated link travel time distributions are then fused with path travel time distributions obtained from the interval detectors using Dempster-Shafer evidence theory. Based on fused path travel time distribution, an optimization technique is further introduced to update link travel time distributions and their spatial correlations. A case study was performed using real-world data from Hong Kong and showed that the proposed method obtained accurate and robust estimations of link and path travel time distributions in congested road networks.
Range estimation of passive infrared targets through the atmosphere

NASA Astrophysics Data System (ADS)

Cho, Hoonkyung; Chun, Joohwan; Seo, Doochun; Choi, Seokweon

2013-04-01

Target range estimation is traditionally based on radar and active sonar systems in modern combat systems. However, jamming signals tremendously degrade the performance of such active sensor devices. We introduce a simple target range estimation method and the fundamental limits of the proposed method based on the atmosphere propagation model. Since passive infrared (IR) sensors measure IR signals radiating from objects in different wavelengths, this method has robustness against electromagnetic jamming. The measured target radiance of each wavelength at the IR sensor depends on the emissive properties of target material and various attenuation factors (i.e., the distance between sensor and target and atmosphere environment parameters). MODTRAN is a tool that models atmospheric propagation of electromagnetic radiation. Based on the results from MODTRAN and atmosphere propagation-based modeling, the target range can be estimated. To analyze the proposed method's performance statistically, we use maximum likelihood estimation (MLE) and evaluate the Cramer-Rao lower bound (CRLB) via the probability density function of measured radiance. We also compare CRLB and the variance of MLE using Monte-Carlo simulation.
Diallel analysis for sex-linked and maternal effects.

PubMed

Zhu, J; Weir, B S

1996-01-01

Genetic models including sex-linked and maternal effects as well as autosomal gene effects are described. Monte Carlo simulations were conducted to compare efficiencies of estimation by minimum norm quadratic unbiased estimation (MINQUE) and restricted maximum likelihood (REML) methods. MINQUE(1), which has 1 for all prior values, has a similar efficiency to MINQUE(θ), which requires prior estimates of parameter values. MINQUE(1) has the advantage over REML of unbiased estimation and convenient computation. An adjusted unbiased prediction (AUP) method is developed for predicting random genetic effects. AUP is desirable for its easy computation and unbiasedness of both mean and variance of predictors. The jackknife procedure is appropriate for estimating the sampling variances of estimated variances (or covariances) and of predicted genetic effects. A t-test based on jackknife variances is applicable for detecting significance of variation. Worked examples from mice and silkworm data are given in order to demonstrate variance and covariance estimation and genetic effect prediction.
A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

ERIC Educational Resources Information Center

Penfield, Randall D.; Algina, James

2006-01-01

One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Methods to estimate the between‐study variance and its uncertainty in meta‐analysis†

PubMed Central

Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian PT; Langan, Dean; Salanti, Georgia

2015-01-01

Meta‐analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between‐study variability, which is typically modelled using a between‐study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between‐study variance, has been long challenged. Our aim is to identify known methods for estimation of the between‐study variance and its corresponding uncertainty, and to summarise the simulation and empirical evidence that compares them. We identified 16 estimators for the between‐study variance, seven methods to calculate confidence intervals, and several comparative studies. Simulation studies suggest that for both dichotomous and continuous data the estimator proposed by Paule and Mandel and for continuous data the restricted maximum likelihood estimator are better alternatives to estimate the between‐study variance. Based on the scenarios and results presented in the published studies, we recommend the Q‐profile method and the alternative approach based on a ‘generalised Cochran between‐study variance statistic’ to compute corresponding confidence intervals around the resulting estimates. Our recommendations are based on a qualitative evaluation of the existing literature and expert consensus. Evidence‐based recommendations require an extensive simulation study where all methods would be compared under the same scenarios. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. PMID:26332144
Blinded sample size re-estimation in three-arm trials with 'gold standard' design.

PubMed

Mütze, Tobias; Friede, Tim

2017-10-15

In this article, we study blinded sample size re-estimation in the 'gold standard' design with internal pilot study for normally distributed outcomes. The 'gold standard' design is a three-arm clinical trial design that includes an active and a placebo control in addition to an experimental treatment. We focus on the absolute margin approach to hypothesis testing in three-arm trials at which the non-inferiority of the experimental treatment and the assay sensitivity are assessed by pairwise comparisons. We compare several blinded sample size re-estimation procedures in a simulation study assessing operating characteristics including power and type I error. We find that sample size re-estimation based on the popular one-sample variance estimator results in overpowered trials. Moreover, sample size re-estimation based on unbiased variance estimators such as the Xing-Ganju variance estimator results in underpowered trials, as it is expected because an overestimation of the variance and thus the sample size is in general required for the re-estimation procedure to eventually meet the target power. To overcome this problem, we propose an inflation factor for the sample size re-estimation with the Xing-Ganju variance estimator and show that this approach results in adequately powered trials. Because of favorable features of the Xing-Ganju variance estimator such as unbiasedness and a distribution independent of the group means, the inflation factor does not depend on the nuisance parameter and, therefore, can be calculated prior to a trial. Moreover, we prove that the sample size re-estimation based on the Xing-Ganju variance estimator does not bias the effect estimate. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Planning additional drilling campaign using two-space genetic algorithm: A game theoretical approach

NASA Astrophysics Data System (ADS)

Kumral, Mustafa; Ozer, Umit

2013-03-01

Grade and tonnage are the most important technical uncertainties in mining ventures because of the use of estimations/simulations, which are mostly generated from drill data. Open pit mines are planned and designed on the basis of the blocks representing the entire orebody. Each block has different estimation/simulation variance reflecting uncertainty to some extent. The estimation/simulation realizations are submitted to mine production scheduling process. However, the use of a block model with varying estimation/simulation variances will lead to serious risk in the scheduling. In the medium of multiple simulations, the dispersion variances of blocks can be thought to regard technical uncertainties. However, the dispersion variance cannot handle uncertainty associated with varying estimation/simulation variances of blocks. This paper proposes an approach that generates the configuration of the best additional drilling campaign to generate more homogenous estimation/simulation variances of blocks. In other words, the objective is to find the best drilling configuration in such a way as to minimize grade uncertainty under budget constraint. Uncertainty measure of the optimization process in this paper is interpolation variance, which considers data locations and grades. The problem is expressed as a minmax problem, which focuses on finding the best worst-case performance i.e., minimizing interpolation variance of the block generating maximum interpolation variance. Since the optimization model requires computing the interpolation variances of blocks being simulated/estimated in each iteration, the problem cannot be solved by standard optimization tools. This motivates to use two-space genetic algorithm (GA) approach to solve the problem. The technique has two spaces: feasible drill hole configuration with minimization of interpolation variance and drill hole simulations with maximization of interpolation variance. Two-space interacts to find a minmax solution iteratively. A case study was conducted to demonstrate the performance of approach. The findings showed that the approach could be used to plan a new drilling campaign.
A spatially explicit capture-recapture estimator for single-catch traps.

PubMed

Distiller, Greg; Borchers, David L

2015-11-01

Single-catch traps are frequently used in live-trapping studies of small mammals. Thus far, a likelihood for single-catch traps has proven elusive and usually the likelihood for multicatch traps is used for spatially explicit capture-recapture (SECR) analyses of such data. Previous work found the multicatch likelihood to provide a robust estimator of average density. We build on a recently developed continuous-time model for SECR to derive a likelihood for single-catch traps. We use this to develop an estimator based on observed capture times and compare its performance by simulation to that of the multicatch estimator for various scenarios with nonconstant density surfaces. While the multicatch estimator is found to be a surprisingly robust estimator of average density, its performance deteriorates with high trap saturation and increasing density gradients. Moreover, it is found to be a poor estimator of the height of the detection function. By contrast, the single-catch estimators of density, distribution, and detection function parameters are found to be unbiased or nearly unbiased in all scenarios considered. This gain comes at the cost of higher variance. If there is no interest in interpreting the detection function parameters themselves, and if density is expected to be fairly constant over the survey region, then the multicatch estimator performs well with single-catch traps. However if accurate estimation of the detection function is of interest, or if density is expected to vary substantially in space, then there is merit in using the single-catch estimator when trap saturation is above about 60%. The estimator's performance is improved if care is taken to place traps so as to span the range of variables that affect animal distribution. As a single-catch likelihood with unknown capture times remains intractable for now, researchers using single-catch traps should aim to incorporate timing devices with their traps.
Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

PubMed Central

Li, Peng; Redden, David T.

2014-01-01

SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738
Evaluation of surface renewal and flux-variance methods above agricultural and forest surfaces

NASA Astrophysics Data System (ADS)

Fischer, M.; Katul, G. G.; Noormets, A.; Poznikova, G.; Domec, J. C.; Trnka, M.; King, J. S.

2016-12-01

Measurements of turbulent surface energy fluxes are of high interest in agriculture and forest research. During last decades, eddy covariance (EC), has been adopted as the most commonly used micrometeorological method for measuring fluxes of greenhouse gases, energy and other scalars at the surface-atmosphere interface. Despite its robustness and accuracy, the costs of EC hinder its deployment at some research experiments and in practice like e.g. for irrigation scheduling. Therefore, testing and development of other cost-effective methods is of high interest. In our study, we tested performance of surface renewal (SR) and flux variance method (FV) for estimates of sensible heat flux density. Surface renewal method is based on the concept of non-random transport of scalars via so-called coherent structures which if accurately identified can be used for the computing of associated flux. Flux variance method predicts the flux from the scalar variance following the surface-layer similarity theory. We tested SR and FV against EC in three types of ecosystem with very distinct aerodynamic properties. First site was represented by agricultural wheat field in the Czech Republic. The second site was a 20-m tall mixed deciduous wetland forest on the coast of North Carolina, USA. The third site was represented by pine-switchgrass intercropping agro-forestry system located in coastal plain of North Carolina, USA. Apart from solving the coherent structures in a SR framework from the structure functions (representing the most common approach), we applied ramp wavelet detection scheme to test the hypothesis that the duration and amplitudes of the coherent structures are normally distributed within the particular 30-minutes time intervals and so just the estimates of their averages is sufficient for the accurate flux determination. Further, we tested whether the orthonormal wavelet thresholding can be used for isolating of the coherent structure scales which are associated with flux transport. Finally, we tested whether low-pass filtering in the Fourier domain based on integral length scale can improve estimates of both SR and FV as it supposedly removes the low frequency portion of the signal not related with the investigated fluxes.
Methods to Estimate the Variance of Some Indices of the Signal Detection Theory: A Simulation Study

ERIC Educational Resources Information Center

Suero, Manuel; Privado, Jesús; Botella, Juan

2017-01-01

A simulation study is presented to evaluate and compare three methods to estimate the variance of the estimates of the parameters d and "C" of the signal detection theory (SDT). Several methods have been proposed to calculate the variance of their estimators, "d'" and "c." Those methods have been mostly assessed by…
Variance computations for functional of absolute risk estimates.

PubMed

Pfeiffer, R M; Petracci, E

2011-07-01

We present a simple influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We apply this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we use an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria are compared to bootstrap variance estimates.
Variance computations for functional of absolute risk estimates

PubMed Central

Pfeiffer, R.M.; Petracci, E.

2011-01-01

We present a simple influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We apply this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we use an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria are compared to bootstrap variance estimates. PMID:21643476
Precision of systematic and random sampling in clustered populations: habitat patches and aggregating organisms.

PubMed

McGarvey, Richard; Burch, Paul; Matthews, Janet M

2016-01-01

Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic field sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard one-start aligned systematic survey design, a uniform 10 x 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fifth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to five times larger. Organisms being restricted to patches of habitat was alone sufficient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confidence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators (v) that corrected for inter-transect correlation (ν₈ and ν(W)) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two "post-stratification" variance estimators (ν₂ and ν₃) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, ν₈ and ν(W) again being the best performers in the longer-range autocorrelated populations. However, no systematic variance estimators tested were free from bias. On balance, systematic designs bring more narrow confidence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confidence interval. The search continues for better estimators of sampling variance for the systematic survey mean.
Stratum variance estimation for sample allocation in crop surveys. [Great Plains Corridor

NASA Technical Reports Server (NTRS)

Perry, C. R., Jr.; Chhikara, R. S. (Principal Investigator)

1980-01-01

The problem of determining stratum variances needed in achieving an optimum sample allocation for crop surveys by remote sensing is investigated by considering an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical crop statistics is developed for obtaining initial estimates of tratum variances. The procedure is applied to estimate stratum variances for wheat in the U.S. Great Plains and is evaluated based on the numerical results thus obtained. It is shown that the proposed technique is viable and performs satisfactorily, with the use of a conservative value for the field size and the crop statistics from the small political subdivision level, when the estimated stratum variances were compared to those obtained using the LANDSAT data.
Multi-population Genomic Relationships for Estimating Current Genetic Variances Within and Genetic Correlations Between Populations.

PubMed

Wientjes, Yvonne C J; Bijma, Piter; Vandenplas, Jérémie; Calus, Mario P L

2017-10-01

Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations. Copyright © 2017 by the Genetics Society of America.
Parameter uncertainty and variability in evaluative fate and exposure models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hertwich, E.G.; McKone, T.E.; Pease, W.S.

The human toxicity potential, a weighting scheme used to evaluate toxic emissions for life cycle assessment and toxics release inventories, is based on potential dose calculations and toxicity factors. This paper evaluates the variance in potential dose calculations that can be attributed to the uncertainty in chemical-specific input parameters as well as the variability in exposure factors and landscape parameters. A knowledge of the uncertainty allows us to assess the robustness of a decision based on the toxicity potential; a knowledge of the sources of uncertainty allows one to focus resources if the uncertainty is to be reduced. The potentialmore » does of 236 chemicals was assessed. The chemicals were grouped by dominant exposure route, and a Monte Carlo analysis was conducted for one representative chemical in each group. The variance is typically one to two orders of magnitude. For comparison, the point estimates in potential dose for 236 chemicals span ten orders of magnitude. Most of the variance in the potential dose is due to chemical-specific input parameters, especially half-lives, although exposure factors such as fish intake and the source of drinking water can be important for chemicals whose dominant exposure is through indirect routes. Landscape characteristics are generally of minor importance.« less
Variance Difference between Maximum Likelihood Estimation Method and Expected A Posteriori Estimation Method Viewed from Number of Test Items

ERIC Educational Resources Information Center

Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S.

2016-01-01

The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…

Robust Bayesian Analysis of Heavy-tailed Stochastic Volatility Models using Scale Mixtures of Normal Distributions

PubMed Central

Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.

2009-01-01

A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
A Fast and Robust Beamspace Adaptive Beamformer for Medical Ultrasound Imaging.

PubMed

Mohades Deylami, Ali; Mohammadzadeh Asl, Babak

2017-06-01

Minimum variance beamformer (MVB) increases the resolution and contrast of medical ultrasound imaging compared with nonadaptive beamformers. These advantages come at the expense of high computational complexity that prevents this adaptive beamformer to be applied in a real-time imaging system. A new beamspace (BS) based on discrete cosine transform is proposed in which the medical ultrasound signals can be represented with less dimensions compared with the standard BS. This is because of symmetric beampattern of the beams in the proposed BS compared with the asymmetric ones in the standard BS. This lets us decrease the dimensions of data to two, so a high complex algorithm, such as the MVB, can be applied faster in this BS. The results indicated that by keeping only two beams, the MVB in the proposed BS provides very similar resolution and also better contrast compared with the standard MVB (SMVB) with only 0.44% of needed flops. Also, this beamformer is more robust against sound speed estimation errors than the SMVB.
Ex Post Facto Monte Carlo Variance Reduction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Booth, Thomas E.

The variance in Monte Carlo particle transport calculations is often dominated by a few particles whose importance increases manyfold on a single transport step. This paper describes a novel variance reduction method that uses a large importance change as a trigger to resample the offending transport step. That is, the method is employed only after (ex post facto) a random walk attempts a transport step that would otherwise introduce a large variance in the calculation.Improvements in two Monte Carlo transport calculations are demonstrated empirically using an ex post facto method. First, the method is shown to reduce the variance inmore » a penetration problem with a cross-section window. Second, the method empirically appears to modify a point detector estimator from an infinite variance estimator to a finite variance estimator.« less
Life-history traits and effective population size in species with overlapping generations revisited: the importance of adult mortality.

PubMed

Waples, R S

2016-10-01

The relationship between life-history traits and the key eco-evolutionary parameters effective population size (Ne) and Ne/N is revisited for iteroparous species with overlapping generations, with a focus on the annual rate of adult mortality (d). Analytical methods based on populations with arbitrarily long adult lifespans are used to evaluate the influence of d on Ne, Ne/N and the factors that determine these parameters: adult abundance (N), generation length (T), age at maturity (α), the ratio of variance to mean reproductive success in one season by individuals of the same age (φ) and lifetime variance in reproductive success of individuals in a cohort (Vk•). Although the resulting estimators of N, T and Vk• are upwardly biased for species with short adult lifespans, the estimate of Ne/N is largely unbiased because biases in T are compensated for by biases in Vk• and N. For the first time, the contrasting effects of T and Vk• on Ne and Ne/N are jointly considered with respect to d and φ. A simple function of d and α based on the assumption of constant vital rates is shown to be a robust predictor (R(2)=0.78) of Ne/N in an empirical data set of life tables for 63 animal and plant species with diverse life histories. Results presented here should provide important context for interpreting the surge of genetically based estimates of Ne that has been fueled by the genomics revolution.
Mixed-Poisson Point Process with Partially-Observed Covariates: Ecological Momentary Assessment of Smoking.

PubMed

Neustifter, Benjamin; Rathbun, Stephen L; Shiffman, Saul

2012-01-01

Ecological Momentary Assessment is an emerging method of data collection in behavioral research that may be used to capture the times of repeated behavioral events on electronic devices, and information on subjects' psychological states through the electronic administration of questionnaires at times selected from a probability-based design as well as the event times. A method for fitting a mixed Poisson point process model is proposed for the impact of partially-observed, time-varying covariates on the timing of repeated behavioral events. A random frailty is included in the point-process intensity to describe variation among subjects in baseline rates of event occurrence. Covariate coefficients are estimated using estimating equations constructed by replacing the integrated intensity in the Poisson score equations with a design-unbiased estimator. An estimator is also proposed for the variance of the random frailties. Our estimators are robust in the sense that no model assumptions are made regarding the distribution of the time-varying covariates or the distribution of the random effects. However, subject effects are estimated under gamma frailties using an approximate hierarchical likelihood. The proposed approach is illustrated using smoking data.
Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

PubMed

Yelland, Lisa N; Salter, Amy B; Ryan, Philip

2011-10-15

Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Toward a better understanding of what makes positive psychology interventions work: predicting happiness and depression from the person × intervention fit in a follow-up after 3.5 years.

PubMed

Proyer, René T; Wellenzohn, Sara; Gander, Fabian; Ruch, Willibald

2015-03-01

Robust evidence exists that positive psychology interventions are effective in enhancing well-being and ameliorating depression. Comparatively little is known about the conditions under which they work best. Models describing characteristics that impact the effectiveness of positive interventions typically contain features of the person, of the activity, and the fit between the two. This study focuses on indicators of the person × intervention fit in predicting happiness and depressive symptoms 3.5 years after completion of the intervention. A sample of 165 women completed measures for happiness and depressive symptoms before and about 3.5 years after completion of a positive intervention (random assignment to one out of nine interventions, which were aggregated for the analyses). Four fit indicators were assessed: Preference; continued practice; effort; and early reactivity. Three out of four person × intervention fit indicators were positively related to happiness or negatively related to depression when controlled for the pretest scores. Together, they explained 6 per cent of the variance in happiness, and 10 per cent of the variance of depressive symptoms. Most tested indicators of a person × intervention fit are robust predictors of happiness and depressive symptoms-even after 3.5 years. They might serve for an early estimation of the effectiveness of a positive intervention. © 2014 The International Association of Applied Psychology.
Robustness of serial clustering of extra-tropical cyclones to the choice of tracking method

NASA Astrophysics Data System (ADS)

Pinto, Joaquim G.; Ulbrich, Sven; Karremann, Melanie K.; Stephenson, David B.; Economou, Theodoros; Shaffrey, Len C.

2016-04-01

Cyclone families are a frequent synoptic weather feature in the Euro-Atlantic area in winter. Given appropriate large-scale conditions, the occurrence of such series (clusters) of storms may lead to large socio-economic impacts and cumulative losses. Recent studies analyzing Reanalysis data using single cyclone tracking methods have shown that serial clustering of cyclones occurs on both flanks and downstream regions of the North Atlantic storm track. This study explores the sensitivity of serial clustering to the choice of tracking method. With this aim, the IMILAST cyclone track database based on ERA-interim data is analysed. Clustering is estimated by the dispersion (ratio of variance to mean) of winter (DJF) cyclones passages near each grid point over the Euro-Atlantic area. Results indicate that while the general pattern of clustering is identified for all methods, there are considerable differences in detail. This can primarily be attributed to the differences in the variance of cyclone counts between the methods, which range up to one order of magnitude. Nevertheless, clustering over the Eastern North Atlantic and Western Europe can be identified for all methods and can thus be generally considered as a robust feature. The statistical links between large-scale patterns like the NAO and clustering are obtained for all methods, though with different magnitudes. We conclude that the occurrence of cyclone clustering over the Eastern North Atlantic and Western Europe is largely independent from the choice of tracking method and hence from the definition of a cyclone.
Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic.

PubMed

Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K; Strug, Lisa J

2014-08-01

Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants. An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. lisa.strug@utoronto.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Robust human machine interface based on head movements applied to assistive robotics.

PubMed

Perez, Elisa; López, Natalia; Orosco, Eugenio; Soria, Carlos; Mut, Vicente; Freire-Bastos, Teodiano

2013-01-01

This paper presents an interface that uses two different sensing techniques and combines both results through a fusion process to obtain the minimum-variance estimator of the orientation of the user's head. Sensing techniques of the interface are based on an inertial sensor and artificial vision. The orientation of the user's head is used to steer the navigation of a robotic wheelchair. Also, a control algorithm for assistive technology system is presented. The system is evaluated by four individuals with severe motors disability and a quantitative index was developed, in order to objectively evaluate the performance. The results obtained are promising since most users could perform the proposed tasks with the robotic wheelchair.
Robust Human Machine Interface Based on Head Movements Applied to Assistive Robotics

PubMed Central

Perez, Elisa; López, Natalia; Orosco, Eugenio; Soria, Carlos; Mut, Vicente; Freire-Bastos, Teodiano

2013-01-01

This paper presents an interface that uses two different sensing techniques and combines both results through a fusion process to obtain the minimum-variance estimator of the orientation of the user's head. Sensing techniques of the interface are based on an inertial sensor and artificial vision. The orientation of the user's head is used to steer the navigation of a robotic wheelchair. Also, a control algorithm for assistive technology system is presented. The system is evaluated by four individuals with severe motors disability and a quantitative index was developed, in order to objectively evaluate the performance. The results obtained are promising since most users could perform the proposed tasks with the robotic wheelchair. PMID:24453877
Variance and covariance estimates for weaning weight of Senepol cattle.

PubMed

Wright, D W; Johnson, Z B; Brown, C J; Wildeus, S

1991-10-01

Variance and covariance components were estimated for weaning weight from Senepol field data for use in the reduced animal model for a maternally influenced trait. The 4,634 weaning records were used to evaluate 113 sires and 1,406 dams on the island of St. Croix. Estimates of direct additive genetic variance (sigma 2A), maternal additive genetic variance (sigma 2M), covariance between direct and maternal additive genetic effects (sigma AM), permanent maternal environmental variance (sigma 2PE), and residual variance (sigma 2 epsilon) were calculated by equating variances estimated from a sire-dam model and a sire-maternal grandsire model, with and without the inverse of the numerator relationship matrix (A-1), to their expectations. Estimates were sigma 2A, 139.05 and 138.14 kg2; sigma 2M, 307.04 and 288.90 kg2; sigma AM, -117.57 and -103.76 kg2; sigma 2PE, -258.35 and -243.40 kg2; and sigma 2 epsilon, 588.18 and 577.72 kg2 with and without A-1, respectively. Heritability estimates for direct additive (h2A) were .211 and .210 with and without A-1, respectively. Heritability estimates for maternal additive (h2M) were .47 and .44 with and without A-1, respectively. Correlations between direct and maternal (IAM) effects were -.57 and -.52 with and without A-1, respectively.
Inverse sequential procedures for the monitoring of time series

NASA Technical Reports Server (NTRS)

Radok, Uwe; Brown, Timothy

1993-01-01

Climate changes traditionally have been detected from long series of observations and long after they happened. The 'inverse sequential' monitoring procedure is designed to detect changes as soon as they occur. Frequency distribution parameters are estimated both from the most recent existing set of observations and from the same set augmented by 1,2,...j new observations. Individual-value probability products ('likelihoods') are then calculated which yield probabilities for erroneously accepting the existing parameter(s) as valid for the augmented data set and vice versa. A parameter change is signaled when these probabilities (or a more convenient and robust compound 'no change' probability) show a progressive decrease. New parameters are then estimated from the new observations alone to restart the procedure. The detailed algebra is developed and tested for Gaussian means and variances, Poisson and chi-square means, and linear or exponential trends; a comprehensive and interactive Fortran program is provided in the appendix.
Cross-bispectrum computation and variance estimation

NASA Technical Reports Server (NTRS)

Lii, K. S.; Helland, K. N.

1981-01-01

A method for the estimation of cross-bispectra of discrete real time series is developed. The asymptotic variance properties of the bispectrum are reviewed, and a method for the direct estimation of bispectral variance is given. The symmetry properties are described which minimize the computations necessary to obtain a complete estimate of the cross-bispectrum in the right-half-plane. A procedure is given for computing the cross-bispectrum by subdividing the domain into rectangular averaging regions which help reduce the variance of the estimates and allow easy application of the symmetry relationships to minimize the computational effort. As an example of the procedure, the cross-bispectrum of a numerically generated, exponentially distributed time series is computed and compared with theory.
Analytical estimation of ultrasound properties, thermal diffusivity, and perfusion using magnetic resonance-guided focused ultrasound temperature data

PubMed Central

Dillon, C R; Borasi, G; Payne, A

2016-01-01

For thermal modeling to play a significant role in treatment planning, monitoring, and control of magnetic resonance-guided focused ultrasound (MRgFUS) thermal therapies, accurate knowledge of ultrasound and thermal properties is essential. This study develops a new analytical solution for the temperature change observed in MRgFUS which can be used with experimental MR temperature data to provide estimates of the ultrasound initial heating rate, Gaussian beam variance, tissue thermal diffusivity, and Pennes perfusion parameter. Simulations demonstrate that this technique provides accurate and robust property estimates that are independent of the beam size, thermal diffusivity, and perfusion levels in the presence of realistic MR noise. The technique is also demonstrated in vivo using MRgFUS heating data in rabbit back muscle. Errors in property estimates are kept less than 5% by applying a third order Taylor series approximation of the perfusion term and ensuring the ratio of the fitting time (the duration of experimental data utilized for optimization) to the perfusion time constant remains less than one. PMID:26741344
On the use of secondary capture-recapture samples to estimate temporary emigration and breeding proportions

USGS Publications Warehouse

Kendall, W.L.; Nichols, J.D.; North, P.M.; Nichols, J.D.

1995-01-01

The use of the Cormack- Jolly-Seber model under a standard sampling scheme of one sample per time period, when the Jolly-Seber assumption that all emigration is permanent does not hold, leads to the confounding of temporary emigration probabilities with capture probabilities. This biases the estimates of capture probability when temporary emigration is a completely random process, and both capture and survival probabilities when there is a temporary trap response in temporary emigration, or it is Markovian. The use of secondary capture samples over a shorter interval within each period, during which the population is assumed to be closed (Pollock's robust design), provides a second source of information on capture probabilities. This solves the confounding problem, and thus temporary emigration probabilities can be estimated. This process can be accomplished in an ad hoc fashion for completely random temporary emigration and to some extent in the temporary trap response case, but modelling the complete sampling process provides more flexibility and permits direct estimation of variances. For the case of Markovian temporary emigration, a full likelihood is required.
Methods to Estimate the Between-Study Variance and Its Uncertainty in Meta-Analysis

ERIC Educational Resources Information Center

Veroniki, Areti Angeliki; Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian P. T.; Langan, Dean; Salanti, Georgia

2016-01-01

Meta-analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between-study variability, which is typically modelled using a between-study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between-study variance,…
An Analysis of Variance Approach for the Estimation of Response Time Distributions in Tests

ERIC Educational Resources Information Center

Attali, Yigal

2010-01-01

Generalizability theory and analysis of variance methods are employed, together with the concept of objective time pressure, to estimate response time distributions and the degree of time pressure in timed tests. By estimating response time variance components due to person, item, and their interaction, and fixed effects due to item types and…
One-shot estimate of MRMC variance: AUC.

PubMed

Gallas, Brandon D

2006-03-01

One popular study design for estimating the area under the receiver operating characteristic curve (AUC) is the one in which a set of readers reads a set of cases: a fully crossed design in which every reader reads every case. The variability of the subsequent reader-averaged AUC has two sources: the multiple readers and the multiple cases (MRMC). In this article, we present a nonparametric estimate for the variance of the reader-averaged AUC that is unbiased and does not use resampling tools. The one-shot estimate is based on the MRMC variance derived by the mechanistic approach of Barrett et al. (2005), as well as the nonparametric variance of a single-reader AUC derived in the literature on U statistics. We investigate the bias and variance properties of the one-shot estimate through a set of Monte Carlo simulations with simulated model observers and images. The different simulation configurations vary numbers of readers and cases, amounts of image noise and internal noise, as well as how the readers are constructed. We compare the one-shot estimate to a method that uses the jackknife resampling technique with an analysis of variance model at its foundation (Dorfman et al. 1992). The name one-shot highlights that resampling is not used. The one-shot and jackknife estimators behave similarly, with the one-shot being marginally more efficient when the number of cases is small. We have derived a one-shot estimate of the MRMC variance of AUC that is based on a probabilistic foundation with limited assumptions, is unbiased, and compares favorably to an established estimate.
A general unified framework to assess the sampling variance of heritability estimates using pedigree or marker-based relationships.

PubMed

Visscher, Peter M; Goddard, Michael E

2015-01-01

Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N(2), where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N. Copyright © 2015 by the Genetics Society of America.

Determination of the optimal level for combining area and yield estimates

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Hixson, M. M.; Jobusch, C. D.

1981-01-01

Several levels of obtaining both area and yield estimates of corn and soybeans in Iowa were considered: county, refined strata, refined/split strata, crop reporting district, and state. Using the CCEA model form and smoothed weather data, regression coefficients at each level were derived to compute yield and its variance. Variances were also computed with stratum level. The variance of the yield estimates was largest at the state and smallest at the county level for both crops. The refined strata had somewhat larger variances than those associated with the refined/split strata and CRD. For production estimates, the difference in standard deviations among levels was not large for corn, but for soybeans the standard deviation at the state level was more than 50% greater than for the other levels. The refined strata had the smallest standard deviations. The county level was not considered in evaluation of production estimates due to lack of county area variances.
A Modified Kriging Method to Interpolate the Soil Moisture Measured by Wireless Sensor Network with the Aid of Remote Sensing Images

NASA Astrophysics Data System (ADS)

Zhang, J.; Liu, Q.; Li, X.; Niu, H.; Cai, E.

2015-12-01

In recent years, wireless sensor network (WSN) emerges to collect Earth observation data at relatively low cost and light labor load, while its observations are still point-data. To learn the spatial distribution of a land surface parameter, interpolating the point data is necessary. Taking soil moisture (SM) for example, its spatial distribution is critical information for agriculture management, hydrological and ecological researches. This study developed a method to interpolate the WSN-measured SM to acquire the spatial distribution in a 5km*5km study area, located in the middle reaches of HEIHE River, western China. As SM is related to many factors such as topology, soil type, vegetation and etc., even the WSN observation grid is not dense enough to reflect the SM distribution pattern. Our idea is to revise the traditional Kriging algorithm, introducing spectral variables, i.e., vegetation index (VI) and abledo, from satellite imagery as supplementary information to aid the interpolation. Thus, the new Extended-Kriging algorithm operates on the spatial & spectral combined space. To run the algorithm, first we need to estimate the SM variance function, which is also extended to the combined space. As the number of WSN samples in the study area is not enough to gather robust statistics, we have to assume that the SM variance function is invariant over time. So, the variance function is estimated from a SM map, derived from the airborne CASI/TASI images acquired in July 10, 2012, and then applied to interpolate WSN data in that season. Data analysis indicates that the new algorithm can provide more details to the variation of land SM. Then, the Leave-one-out cross-validation is adopted to estimate the interpolation accuracy. Although a reasonable accuracy can be achieved, the result is not yet satisfactory. Besides improving the algorithm, the uncertainties in WSN measurements may also need to be controlled in our further work.
On the design of classifiers for crop inventories

NASA Technical Reports Server (NTRS)

Heydorn, R. P.; Takacs, H. C.

1986-01-01

Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Estimation of genetic variance for macro- and micro-environmental sensitivity using double hierarchical generalized linear models.

PubMed

Mulder, Han A; Rönnegård, Lars; Fikse, W Freddy; Veerkamp, Roel F; Strandberg, Erling

2013-07-04

Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike's information criterion using h-likelihood to select the best fitting model. We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike's information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike's information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Improving the precision of lake ecosystem metabolism estimates by identifying predictors of model uncertainty

USGS Publications Warehouse

Rose, Kevin C.; Winslow, Luke A.; Read, Jordan S.; Read, Emily K.; Solomon, Christopher T.; Adrian, Rita; Hanson, Paul C.

2014-01-01

Diel changes in dissolved oxygen are often used to estimate gross primary production (GPP) and ecosystem respiration (ER) in aquatic ecosystems. Despite the widespread use of this approach to understand ecosystem metabolism, we are only beginning to understand the degree and underlying causes of uncertainty for metabolism model parameter estimates. Here, we present a novel approach to improve the precision and accuracy of ecosystem metabolism estimates by identifying physical metrics that indicate when metabolism estimates are highly uncertain. Using datasets from seventeen instrumented GLEON (Global Lake Ecological Observatory Network) lakes, we discovered that many physical characteristics correlated with uncertainty, including PAR (photosynthetically active radiation, 400-700 nm), daily variance in Schmidt stability, and wind speed. Low PAR was a consistent predictor of high variance in GPP model parameters, but also corresponded with low ER model parameter variance. We identified a threshold (30% of clear sky PAR) below which GPP parameter variance increased rapidly and was significantly greater in nearly all lakes compared with variance on days with PAR levels above this threshold. The relationship between daily variance in Schmidt stability and GPP model parameter variance depended on trophic status, whereas daily variance in Schmidt stability was consistently positively related to ER model parameter variance. Wind speeds in the range of ~0.8-3 m s–1 were consistent predictors of high variance for both GPP and ER model parameters, with greater uncertainty in eutrophic lakes. Our findings can be used to reduce ecosystem metabolism model parameter uncertainty and identify potential sources of that uncertainty.
Correcting for Systematic Bias in Sample Estimates of Population Variances: Why Do We Divide by n-1?

ERIC Educational Resources Information Center

Mittag, Kathleen Cage

An important topic presented in introductory statistics courses is the estimation of population parameters using samples. Students learn that when estimating population variances using sample data, we always get an underestimate of the population variance if we divide by n rather than n-1. One implication of this correction is that the degree of…
A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

PubMed

Hu, Pingsha; Maiti, Tapabrata

2011-01-01

Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.
A Nonparametric Mean-Variance Smoothing Method to Assess Arabidopsis Cold Stress Transcriptional Regulator CBF2 Overexpression Microarray Data

PubMed Central

Hu, Pingsha; Maiti, Tapabrata

2011-01-01

Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181
Online Estimation of Allan Variance Coefficients Based on a Neural-Extended Kalman Filter

PubMed Central

Miao, Zhiyong; Shen, Feng; Xu, Dingjie; He, Kunpeng; Tian, Chunmiao

2015-01-01

As a noise analysis method for inertial sensors, the traditional Allan variance method requires the storage of a large amount of data and manual analysis for an Allan variance graph. Although the existing online estimation methods avoid the storage of data and the painful procedure of drawing slope lines for estimation, they require complex transformations and even cause errors during the modeling of dynamic Allan variance. To solve these problems, first, a new state-space model that directly models the stochastic errors to obtain a nonlinear state-space model was established for inertial sensors. Then, a neural-extended Kalman filter algorithm was used to estimate the Allan variance coefficients. The real noises of an ADIS16405 IMU and fiber optic gyro-sensors were analyzed by the proposed method and traditional methods. The experimental results show that the proposed method is more suitable to estimate the Allan variance coefficients than the traditional methods. Moreover, the proposed method effectively avoids the storage of data and can be easily implemented using an online processor. PMID:25625903
Improvement of Bragg peak shift estimation using dimensionality reduction techniques and predictive linear modeling

NASA Astrophysics Data System (ADS)

Xing, Yafei; Macq, Benoit

2017-11-01

With the emergence of clinical prototypes and first patient acquisitions for proton therapy, the research on prompt gamma imaging is aiming at making most use of the prompt gamma data for in vivo estimation of any shift from expected Bragg peak (BP). The simple problem of matching the measured prompt gamma profile of each pencil beam with a reference simulation from the treatment plan is actually made complex by uncertainties which can translate into distortions during treatment. We will illustrate this challenge and demonstrate the robustness of a predictive linear model we proposed for BP shift estimation based on principal component analysis (PCA) method. It considered the first clinical knife-edge slit camera design in use with anthropomorphic phantom CT data. Particularly, 4115 error scenarios were simulated for the learning model. PCA was applied to the training input randomly chosen from 500 scenarios for eliminating data collinearities. A total variance of 99.95% was used for representing the testing input from 3615 scenarios. This model improved the BP shift estimation by an average of 63+/-19% in a range between -2.5% and 86%, comparing to our previous profile shift (PS) method. The robustness of our method was demonstrated by a comparative study conducted by applying 1000 times Poisson noise to each profile. 67% cases obtained by the learning model had lower prediction errors than those obtained by PS method. The estimation accuracy ranged between 0.31 +/- 0.22 mm and 1.84 +/- 8.98 mm for the learning model, while for PS method it ranged between 0.3 +/- 0.25 mm and 20.71 +/- 8.38 mm.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

PubMed

Dazard, Jean-Eudes; Rao, J Sunil

2012-07-01

The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data

PubMed Central

Dazard, Jean-Eudes; Rao, J. Sunil

2012-01-01

The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding

PubMed Central

2013-01-01

Background In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. Results The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. Conclusions The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies. PMID:24314298
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.

PubMed

Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter

2013-12-06

In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.
A Robust Post-Processing Workflow for Datasets with Motion Artifacts in Diffusion Kurtosis Imaging

PubMed Central

Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X.; Wan, Mingxi

2014-01-01

Purpose The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). Materials and methods The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). Results The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (p<0.05). It indicated that LPCC was more sensitive in detecting motion artifacts. MSEs of all derived parameters from the reserved data after the artifacts rejection were smaller than the variance of the noise. It suggested that influence of rejected artifacts was less than influence of noise on the precision of derived parameters. The proposed workflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (p<0.05). Conclusion The proposed post-processing workflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post-processing method for clinical applications of DKI in subjects with involuntary movements. PMID:24727862
A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

PubMed

Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

2014-01-01

The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (p<0.05). It indicated that LPCC was more sensitive in detecting motion artifacts. MSEs of all derived parameters from the reserved data after the artifacts rejection were smaller than the variance of the noise. It suggested that influence of rejected artifacts was less than influence of noise on the precision of derived parameters. The proposed workflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (p<0.05). The proposed post-processing workflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post-processing method for clinical applications of DKI in subjects with involuntary movements.
A Robust Crowdsourcing-Based Indoor Localization System.

PubMed

Zhou, Baoding; Li, Qingquan; Mao, Qingzhou; Tu, Wei

2017-04-14

WiFi fingerprinting-based indoor localization has been widely used due to its simplicity and can be implemented on the smartphones. The major drawback of WiFi fingerprinting is that the radio map construction is very labor-intensive and time-consuming. Another drawback of WiFi fingerprinting is the Received Signal Strength (RSS) variance problem, caused by environmental changes and device diversity. RSS variance severely degrades the localization accuracy. In this paper, we propose a robust crowdsourcing-based indoor localization system (RCILS). RCILS can automatically construct the radio map using crowdsourcing data collected by smartphones. RCILS abstracts the indoor map as the semantics graph in which the edges are the possible user paths and the vertexes are the location where users may take special activities. RCILS extracts the activity sequence contained in the trajectories by activity detection and pedestrian dead-reckoning. Based on the semantics graph and activity sequence, crowdsourcing trajectories can be located and a radio map is constructed based on the localization results. For the RSS variance problem, RCILS uses the trajectory fingerprint model for indoor localization. During online localization, RCILS obtains an RSS sequence and realizes localization by matching the RSS sequence with the radio map. To evaluate RCILS, we apply RCILS in an office building. Experiment results demonstrate the efficiency and robustness of RCILS.
A Robust Crowdsourcing-Based Indoor Localization System

PubMed Central

Zhou, Baoding; Li, Qingquan; Mao, Qingzhou; Tu, Wei

2017-01-01

WiFi fingerprinting-based indoor localization has been widely used due to its simplicity and can be implemented on the smartphones. The major drawback of WiFi fingerprinting is that the radio map construction is very labor-intensive and time-consuming. Another drawback of WiFi fingerprinting is the Received Signal Strength (RSS) variance problem, caused by environmental changes and device diversity. RSS variance severely degrades the localization accuracy. In this paper, we propose a robust crowdsourcing-based indoor localization system (RCILS). RCILS can automatically construct the radio map using crowdsourcing data collected by smartphones. RCILS abstracts the indoor map as the semantics graph in which the edges are the possible user paths and the vertexes are the location where users may take special activities. RCILS extracts the activity sequence contained in the trajectories by activity detection and pedestrian dead-reckoning. Based on the semantics graph and activity sequence, crowdsourcing trajectories can be located and a radio map is constructed based on the localization results. For the RSS variance problem, RCILS uses the trajectory fingerprint model for indoor localization. During online localization, RCILS obtains an RSS sequence and realizes localization by matching the RSS sequence with the radio map. To evaluate RCILS, we apply RCILS in an office building. Experiment results demonstrate the efficiency and robustness of RCILS. PMID:28420108
Estimating means and variances: The comparative efficiency of composite and grab samples.

PubMed

Brumelle, S; Nemetz, P; Casey, D

1984-03-01

This paper compares the efficiencies of two sampling techniques for estimating a population mean and variance. One procedure, called grab sampling, consists of collecting and analyzing one sample per period. The second procedure, called composite sampling, collectsn samples per period which are then pooled and analyzed as a single sample. We review the well known fact that composite sampling provides a superior estimate of the mean. However, it is somewhat surprising that composite sampling does not always generate a more efficient estimate of the variance. For populations with platykurtic distributions, grab sampling gives a more efficient estimate of the variance, whereas composite sampling is better for leptokurtic distributions. These conditions on kurtosis can be related to peakedness and skewness. For example, a necessary condition for composite sampling to provide a more efficient estimate of the variance is that the population density function evaluated at the mean (i.e.f(μ)) be greater than[Formula: see text]. If[Formula: see text], then a grab sample is more efficient. In spite of this result, however, composite sampling does provide a smaller estimate of standard error than does grab sampling in the context of estimating population means.
Reduced rank models for travel time estimation of low order mode pulses.

PubMed

Chandrayadula, Tarun K; Wage, Kathleen E; Worcester, Peter F; Dzieciuch, Matthew A; Mercer, James A; Andrew, Rex K; Howe, Bruce M

2013-10-01

Mode travel time estimation in the presence of internal waves (IWs) is a challenging problem. IWs perturb the sound speed, which results in travel time wander and mode scattering. A standard approach to travel time estimation is to pulse compress the broadband signal, pick the peak of the compressed time series, and average the peak time over multiple receptions to reduce variance. The peak-picking approach implicitly assumes there is a single strong arrival and does not perform well when there are multiple arrivals due to scattering. This article presents a statistical model for the scattered mode arrivals and uses the model to design improved travel time estimators. The model is based on an Empirical Orthogonal Function (EOF) analysis of the mode time series. Range-dependent simulations and data from the Long-range Ocean Acoustic Propagation Experiment (LOAPEX) indicate that the modes are represented by a small number of EOFs. The reduced-rank EOF model is used to construct a travel time estimator based on the Matched Subspace Detector (MSD). Analysis of simulation and experimental data show that the MSDs are more robust to IW scattering than peak picking. The simulation analysis also highlights how IWs affect the mode excitation by the source.

Smooth empirical Bayes estimation of observation error variances in linear systems

NASA Technical Reports Server (NTRS)

Martz, H. F., Jr.; Lian, M. W.

1972-01-01

A smooth empirical Bayes estimator was developed for estimating the unknown random scale component of each of a set of observation error variances. It is shown that the estimator possesses a smaller average squared error loss than other estimators for a discrete time linear system.
Robustness of methods for blinded sample size re-estimation with overdispersed count data.

PubMed

Schneider, Simon; Schmidli, Heinz; Friede, Tim

2013-09-20

Counts of events are increasingly common as primary endpoints in randomized clinical trials. With between-patient heterogeneity leading to variances in excess of the mean (referred to as overdispersion), statistical models reflecting this heterogeneity by mixtures of Poisson distributions are frequently employed. Sample size calculation in the planning of such trials requires knowledge on the nuisance parameters, that is, the control (or overall) event rate and the overdispersion parameter. Usually, there is only little prior knowledge regarding these parameters in the design phase resulting in considerable uncertainty regarding the sample size. In this situation internal pilot studies have been found very useful and very recently several blinded procedures for sample size re-estimation have been proposed for overdispersed count data, one of which is based on an EM-algorithm. In this paper we investigate the EM-algorithm based procedure with respect to aspects of their implementation by studying the algorithm's dependence on the choice of convergence criterion and find that the procedure is sensitive to the choice of the stopping criterion in scenarios relevant to clinical practice. We also compare the EM-based procedure to other competing procedures regarding their operating characteristics such as sample size distribution and power. Furthermore, the robustness of these procedures to deviations from the model assumptions is explored. We find that some of the procedures are robust to at least moderate deviations. The results are illustrated using data from the US National Heart, Lung and Blood Institute sponsored Asymptomatic Cardiac Ischemia Pilot study. Copyright © 2013 John Wiley & Sons, Ltd.
Estimation of genetic connectedness diagnostics based on prediction errors without the prediction error variance-covariance matrix.

PubMed

Holmes, John B; Dodds, Ken G; Lee, Michael A

2017-03-02

An important issue in genetic evaluation is the comparability of random effects (breeding values), particularly between pairs of animals in different contemporary groups. This is usually referred to as genetic connectedness. While various measures of connectedness have been proposed in the literature, there is general agreement that the most appropriate measure is some function of the prediction error variance-covariance matrix. However, obtaining the prediction error variance-covariance matrix is computationally demanding for large-scale genetic evaluations. Many alternative statistics have been proposed that avoid the computational cost of obtaining the prediction error variance-covariance matrix, such as counts of genetic links between contemporary groups, gene flow matrices, and functions of the variance-covariance matrix of estimated contemporary group fixed effects. In this paper, we show that a correction to the variance-covariance matrix of estimated contemporary group fixed effects will produce the exact prediction error variance-covariance matrix averaged by contemporary group for univariate models in the presence of single or multiple fixed effects and one random effect. We demonstrate the correction for a series of models and show that approximations to the prediction error matrix based solely on the variance-covariance matrix of estimated contemporary group fixed effects are inappropriate in certain circumstances. Our method allows for the calculation of a connectedness measure based on the prediction error variance-covariance matrix by calculating only the variance-covariance matrix of estimated fixed effects. Since the number of fixed effects in genetic evaluation is usually orders of magnitudes smaller than the number of random effect levels, the computational requirements for our method should be reduced.
Development of a method of robust rain gauge network optimization based on intensity-duration-frequency results

NASA Astrophysics Data System (ADS)

Chebbi, A.; Bargaoui, Z. K.; da Conceição Cunha, M.

2012-12-01

Based on rainfall intensity-duration-frequency (IDF) curves, a robust optimization approach is proposed to identify the best locations to install new rain gauges. The advantage of robust optimization is that the resulting design solutions yield networks which behave acceptably under hydrological variability. Robust optimisation can overcome the problem of selecting representative rainfall events when building the optimization process. This paper reports an original approach based on Montana IDF model parameters. The latter are assumed to be geostatistical variables and their spatial interdependence is taken into account through the adoption of cross-variograms in the kriging process. The problem of optimally locating a fixed number of new monitoring stations based on an existing rain gauge network is addressed. The objective function is based on the mean spatial kriging variance and rainfall variogram structure using a variance-reduction method. Hydrological variability was taken into account by considering and implementing several return periods to define the robust objective function. Variance minimization is performed using a simulated annealing algorithm. In addition, knowledge of the time horizon is needed for the computation of the robust objective function. A short and a long term horizon were studied, and optimal networks are identified for each. The method developed is applied to north Tunisia (area = 21 000 km2). Data inputs for the variogram analysis were IDF curves provided by the hydrological bureau and available for 14 tipping bucket type rain gauges. The recording period was from 1962 to 2001, depending on the station. The study concerns an imaginary network augmentation based on the network configuration in 1973, which is a very significant year in Tunisia because there was an exceptional regional flood event in March 1973. This network consisted of 13 stations and did not meet World Meteorological Organization (WMO) recommendations for the minimum spatial density. So, it is proposed to virtually augment it by 25, 50, 100 and 160% which is the rate that would meet WMO requirements. Results suggest that for a given augmentation robust networks remain stable overall for the two time horizons.
Development of a method of robust rain gauge network optimization based on intensity-duration-frequency results

NASA Astrophysics Data System (ADS)

Chebbi, A.; Bargaoui, Z. K.; da Conceição Cunha, M.

2013-10-01

Based on rainfall intensity-duration-frequency (IDF) curves, fitted in several locations of a given area, a robust optimization approach is proposed to identify the best locations to install new rain gauges. The advantage of robust optimization is that the resulting design solutions yield networks which behave acceptably under hydrological variability. Robust optimization can overcome the problem of selecting representative rainfall events when building the optimization process. This paper reports an original approach based on Montana IDF model parameters. The latter are assumed to be geostatistical variables, and their spatial interdependence is taken into account through the adoption of cross-variograms in the kriging process. The problem of optimally locating a fixed number of new monitoring stations based on an existing rain gauge network is addressed. The objective function is based on the mean spatial kriging variance and rainfall variogram structure using a variance-reduction method. Hydrological variability was taken into account by considering and implementing several return periods to define the robust objective function. Variance minimization is performed using a simulated annealing algorithm. In addition, knowledge of the time horizon is needed for the computation of the robust objective function. A short- and a long-term horizon were studied, and optimal networks are identified for each. The method developed is applied to north Tunisia (area = 21 000 km2). Data inputs for the variogram analysis were IDF curves provided by the hydrological bureau and available for 14 tipping bucket type rain gauges. The recording period was from 1962 to 2001, depending on the station. The study concerns an imaginary network augmentation based on the network configuration in 1973, which is a very significant year in Tunisia because there was an exceptional regional flood event in March 1973. This network consisted of 13 stations and did not meet World Meteorological Organization (WMO) recommendations for the minimum spatial density. Therefore, it is proposed to augment it by 25, 50, 100 and 160% virtually, which is the rate that would meet WMO requirements. Results suggest that for a given augmentation robust networks remain stable overall for the two time horizons.
Comment on Hoffman and Rovine (2007): SPSS MIXED can estimate models with heterogeneous variances.

PubMed

Weaver, Bruce; Black, Ryan A

2015-06-01

Hoffman and Rovine (Behavior Research Methods, 39:101-117, 2007) have provided a very nice overview of how multilevel models can be useful to experimental psychologists. They included two illustrative examples and provided both SAS and SPSS commands for estimating the models they reported. However, upon examining the SPSS syntax for the models reported in their Table 3, we found no syntax for models 2B and 3B, both of which have heterogeneous error variances. Instead, there is syntax that estimates similar models with homogeneous error variances and a comment stating that SPSS does not allow heterogeneous errors. But that is not correct. We provide SPSS MIXED commands to estimate models 2B and 3B with heterogeneous error variances and obtain results nearly identical to those reported by Hoffman and Rovine in their Table 3. Therefore, contrary to the comment in Hoffman and Rovine's syntax file, SPSS MIXED can estimate models with heterogeneous error variances.
Kalman filter for statistical monitoring of forest cover across sub-continental regions [Symposium

Treesearch

Raymond L. Czaplewski

1991-01-01

The Kalman filter is a generalization of the composite estimator. The univariate composite estimate combines 2 prior estimates of population parameter with a weighted average where the scalar weight is inversely proportional to the variances. The composite estimator is a minimum variance estimator that requires no distributional assumptions other than estimates of the...
A de-noising method using the improved wavelet threshold function based on noise variance estimation

NASA Astrophysics Data System (ADS)

Liu, Hui; Wang, Weida; Xiang, Changle; Han, Lijin; Nie, Haizhao

2018-01-01

The precise and efficient noise variance estimation is very important for the processing of all kinds of signals while using the wavelet transform to analyze signals and extract signal features. In view of the problem that the accuracy of traditional noise variance estimation is greatly affected by the fluctuation of noise values, this study puts forward the strategy of using the two-state Gaussian mixture model to classify the high-frequency wavelet coefficients in the minimum scale, which takes both the efficiency and accuracy into account. According to the noise variance estimation, a novel improved wavelet threshold function is proposed by combining the advantages of hard and soft threshold functions, and on the basis of the noise variance estimation algorithm and the improved wavelet threshold function, the research puts forth a novel wavelet threshold de-noising method. The method is tested and validated using random signals and bench test data of an electro-mechanical transmission system. The test results indicate that the wavelet threshold de-noising method based on the noise variance estimation shows preferable performance in processing the testing signals of the electro-mechanical transmission system: it can effectively eliminate the interference of transient signals including voltage, current, and oil pressure and maintain the dynamic characteristics of the signals favorably.
Performance of time-varying predictors in multilevel models under an assumption of fixed or random effects.

PubMed

Baird, Rachel; Maxwell, Scott E

2016-06-01

Time-varying predictors in multilevel models are a useful tool for longitudinal research, whether they are the research variable of interest or they are controlling for variance to allow greater power for other variables. However, standard recommendations to fix the effect of time-varying predictors may make an assumption that is unlikely to hold in reality and may influence results. A simulation study illustrates that treating the time-varying predictor as fixed may allow analyses to converge, but the analyses have poor coverage of the true fixed effect when the time-varying predictor has a random effect in reality. A second simulation study shows that treating the time-varying predictor as random may have poor convergence, except when allowing negative variance estimates. Although negative variance estimates are uninterpretable, results of the simulation show that estimates of the fixed effect of the time-varying predictor are as accurate for these cases as for cases with positive variance estimates, and that treating the time-varying predictor as random and allowing negative variance estimates performs well whether the time-varying predictor is fixed or random in reality. Because of the difficulty of interpreting negative variance estimates, 2 procedures are suggested for selection between fixed-effect and random-effect models: comparing between fixed-effect and constrained random-effect models with a likelihood ratio test or fitting a fixed-effect model when an unconstrained random-effect model produces negative variance estimates. The performance of these 2 procedures is compared. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Bootstrap Estimation and Testing for Variance Equality.

ERIC Educational Resources Information Center

Olejnik, Stephen; Algina, James

The purpose of this study was to develop a single procedure for comparing population variances which could be used for distribution forms. Bootstrap methodology was used to estimate the variability of the sample variance statistic when the population distribution was normal, platykurtic and leptokurtic. The data for the study were generated and…
Estimation of genetic variance for macro- and micro-environmental sensitivity using double hierarchical generalized linear models

PubMed Central

2013-01-01

Background Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring. PMID:23827014
Empirical Bayes estimation of undercount in the decennial census.

PubMed

Cressie, N

1989-12-01

Empirical Bayes methods are used to estimate the extent of the undercount at the local level in the 1980 U.S. census. "Grouping of like subareas from areas such as states, counties, and so on into strata is a useful way of reducing the variance of undercount estimators. By modeling the subareas within a stratum to have a common mean and variances inversely proportional to their census counts, and by taking into account sampling of the areas (e.g., by dual-system estimation), empirical Bayes estimators that compromise between the (weighted) stratum average and the sample value can be constructed. The amount of compromise is shown to depend on the relative importance of stratum variance to sampling variance. These estimators are evaluated at the state level (51 states, including Washington, D.C.) and stratified on race/ethnicity (3 strata) using data from the 1980 postenumeration survey (PEP 3-8, for the noninstitutional population)." excerpt
Meta-heuristic CRPS minimization for the calibration of short-range probabilistic forecasts

NASA Astrophysics Data System (ADS)

Mohammadi, Seyedeh Atefeh; Rahmani, Morteza; Azadi, Majid

2016-08-01

This paper deals with the probabilistic short-range temperature forecasts over synoptic meteorological stations across Iran using non-homogeneous Gaussian regression (NGR). NGR creates a Gaussian forecast probability density function (PDF) from the ensemble output. The mean of the normal predictive PDF is a bias-corrected weighted average of the ensemble members and its variance is a linear function of the raw ensemble variance. The coefficients for the mean and variance are estimated by minimizing the continuous ranked probability score (CRPS) during a training period. CRPS is a scoring rule for distributional forecasts. In the paper of Gneiting et al. (Mon Weather Rev 133:1098-1118, 2005), Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is used to minimize the CRPS. Since BFGS is a conventional optimization method with its own limitations, we suggest using the particle swarm optimization (PSO), a robust meta-heuristic method, to minimize the CRPS. The ensemble prediction system used in this study consists of nine different configurations of the weather research and forecasting model for 48-h forecasts of temperature during autumn and winter 2011 and 2012. The probabilistic forecasts were evaluated using several common verification scores including Brier score, attribute diagram and rank histogram. Results show that both BFGS and PSO find the optimal solution and show the same evaluation scores, but PSO can do this with a feasible random first guess and much less computational complexity.
Psychometric Testing of the Greek Version of the Clinical Learning Environment-Teacher (CLES+T).

PubMed

Papastavrou, Evridiki; Dimitriadou, Maria; Tsangari, Haritini

2015-09-01

Clinical practice is an important part of nursing education, and robust instruments are required to evaluate the effectiveness of the hospital setting as a learning environment. The study aim is the psychometric test of the Clinical Learning Environment+Teacher (CLES+T) scale-Greek version. 463 students practicing in acute care hospitals participated in the study. The reliability of the instrument was estimated with Cronbach's alpha coefficients. The construct validity was evaluated using exploratory factor analysis (EFA) with Varimax rotation. Convergent validity was examined by measuring the bivariate correlations between the scale/subscales. Content, validity and semantic equivalence were examined through reviews by a panel of experts. The total scale showed high internal consistency (α=0.95). EFA was identical to the original scale, had eigen values larger than one and explained a total of 67.4% of the variance. The factor with the highest eigen value and the largest percentage of variance explained was "supervisory relationship", with an original eigenvalue of 13.1 (6.8 after Varimax rotation) and an explanation of around 38% of the variance (or 20% after rotation). Convergent validity was examined by measuring the bivariate correlations between the scale and a question that measured the general satisfaction. The Greek version of the CLES+T is a valid and reliable instrument that can be used to examine students' perceptions of the clinical learning environment.
Variance components estimation for continuous and discrete data, with emphasis on cross-classified sampling designs

USGS Publications Warehouse

Gray, Brian R.; Gitzen, Robert A.; Millspaugh, Joshua J.; Cooper, Andrew B.; Licht, Daniel S.

2012-01-01

Variance components may play multiple roles (cf. Cox and Solomon 2003). First, magnitudes and relative magnitudes of the variances of random factors may have important scientific and management value in their own right. For example, variation in levels of invasive vegetation among and within lakes may suggest causal agents that operate at both spatial scales – a finding that may be important for scientific and management reasons. Second, variance components may also be of interest when they affect precision of means and covariate coefficients. For example, variation in the effect of water depth on the probability of aquatic plant presence in a study of multiple lakes may vary by lake. This variation will affect the precision of the average depth-presence association. Third, variance component estimates may be used when designing studies, including monitoring programs. For example, to estimate the numbers of years and of samples per year required to meet long-term monitoring goals, investigators need estimates of within and among-year variances. Other chapters in this volume (Chapters 7, 8, and 10) as well as extensive external literature outline a framework for applying estimates of variance components to the design of monitoring efforts. For example, a series of papers with an ecological monitoring theme examined the relative importance of multiple sources of variation, including variation in means among sites, years, and site-years, for the purposes of temporal trend detection and estimation (Larsen et al. 2004, and references therein).
Estimation of population size using open capture-recapture models

USGS Publications Warehouse

McDonald, T.L.; Amstrup, Steven C.

2001-01-01

One of the most important needs for wildlife managers is an accurate estimate of population size. Yet, for many species, including most marine species and large mammals, accurate and precise estimation of numbers is one of the most difficult of all research challenges. Open-population capture-recapture models have proven useful in many situations to estimate survival probabilities but typically have not been used to estimate population size. We show that open-population models can be used to estimate population size by developing a Horvitz-Thompson-type estimate of population size and an estimator of its variance. Our population size estimate keys on the probability of capture at each trap occasion and therefore is quite general and can be made a function of external covariates measured during the study. Here we define the estimator and investigate its bias, variance, and variance estimator via computer simulation. Computer simulations make extensive use of real data taken from a study of polar bears (Ursus maritimus) in the Beaufort Sea. The population size estimator is shown to be useful because it was negligibly biased in all situations studied. The variance estimator is shown to be useful in all situations, but caution is warranted in cases of extreme capture heterogeneity.
Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation.

PubMed

Matafora, Vittoria; Corno, Andrea; Ciliberto, Andrea; Bachi, Angela

2017-04-07

In global proteomic analysis, it is estimated that proteins span from millions to less than 100 copies per cell. The challenge of protein quantitation by classic shotgun proteomic techniques relies on the presence of missing values in peptides belonging to low-abundance proteins that lowers intraruns reproducibility affecting postdata statistical analysis. Here, we present a new analytical workflow MvM (missing value monitoring) able to recover quantitation of missing values generated by shotgun analysis. In particular, we used confident data-dependent acquisition (DDA) quantitation only for proteins measured in all the runs, while we filled the missing values with data-independent acquisition analysis using the library previously generated in DDA. We analyzed cell cycle regulated proteins, as they are low abundance proteins with highly dynamic expression levels. Indeed, we found that cell cycle related proteins are the major components of the missing values-rich proteome. Using the MvM workflow, we doubled the number of robustly quantified cell cycle related proteins, and we reduced the number of missing values achieving robust quantitation for proteins over ∼50 molecules per cell. MvM allows lower quantification variance among replicates for low abundance proteins with respect to DDA analysis, which demonstrates the potential of this novel workflow to measure low abundance, dynamically regulated proteins.
Precipitation estimation in mountainous terrain using multivariate geostatistics. Part II: isohyetal maps

USGS Publications Warehouse

Hevesi, Joseph A.; Flint, Alan L.; Istok, Jonathan D.

1992-01-01

Values of average annual precipitation (AAP) may be important for hydrologic characterization of a potential high-level nuclear-waste repository site at Yucca Mountain, Nevada. Reliable measurements of AAP are sparse in the vicinity of Yucca Mountain, and estimates of AAP were needed for an isohyetal mapping over a 2600-square-mile watershed containing Yucca Mountain. Estimates were obtained with a multivariate geostatistical model developed using AAP and elevation data from a network of 42 precipitation stations in southern Nevada and southeastern California. An additional 1531 elevations were obtained to improve estimation accuracy. Isohyets representing estimates obtained using univariate geostatistics (kriging) defined a smooth and continuous surface. Isohyets representing estimates obtained using multivariate geostatistics (cokriging) defined an irregular surface that more accurately represented expected local orographic influences on AAP. Cokriging results included a maximum estimate within the study area of 335 mm at an elevation of 7400 ft, an average estimate of 157 mm for the study area, and an average estimate of 172 mm at eight locations in the vicinity of the potential repository site. Kriging estimates tended to be lower in comparison because the increased AAP expected for remote mountainous topography was not adequately represented by the available sample. Regression results between cokriging estimates and elevation were similar to regression results between measured AAP and elevation. The position of the cokriging 250-mm isohyet relative to the boundaries of pinyon pine and juniper woodlands provided indirect evidence of improved estimation accuracy because the cokriging result agreed well with investigations by others concerning the relationship between elevation, vegetation, and climate in the Great Basin. Calculated estimation variances were also mapped and compared to evaluate improvements in estimation accuracy. Cokriging estimation variances were reduced by an average of 54% relative to kriging variances within the study area. Cokriging reduced estimation variances at the potential repository site by 55% relative to kriging. The usefulness of an existing network of stations for measuring AAP within the study area was evaluated using cokriging variances, and twenty additional stations were located for the purpose of improving the accuracy of future isohyetal mappings. Using the expanded network of stations, the maximum cokriging estimation variance within the study area was reduced by 78% relative to the existing network, and the average estimation variance was reduced by 52%.
A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

PubMed

Pare, Guillaume; Mao, Shihong; Deng, Wei Q

2016-06-08

Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.
Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study

PubMed Central

Kim, Minjung; Lamont, Andrea E.; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M. Lee

2015-01-01

Regression mixture models are a novel approach for modeling heterogeneous effects of predictors on an outcome. In the model building process residual variances are often disregarded and simplifying assumptions made without thorough examination of the consequences. This simulation study investigated the impact of an equality constraint on the residual variances across latent classes. We examine the consequence of constraining the residual variances on class enumeration (finding the true number of latent classes) and parameter estimates under a number of different simulation conditions meant to reflect the type of heterogeneity likely to exist in applied analyses. Results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted estimated class sizes and showed the potential to greatly impact parameter estimates in each class. Results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions were made. PMID:26139512

A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

PubMed Central

Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

2016-01-01

Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519
Statistical study of EBR-II fuel elements manufactured by the cold line at Argonne-West and by Atomics International

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harkness, A. L.

1977-09-01

Nine elements from each batch of fuel elements manufactured for the EBR-II reactor have been analyzed for /sup 235/U content by NDA methods. These values, together with those of the manufacturer, are used to estimate the product variance and the variances of the two measuring methods. These variances are compared with the variances computed from the stipulations of the contract. A method is derived for resolving the several variances into their within-batch and between-batch components. Some of these variance components have also been estimated by independent and more familiar conventional methods for comparison.
Characterization of turbulence stability through the identification of multifractional Brownian motions

NASA Astrophysics Data System (ADS)

Lee, K. C.

2013-02-01

Multifractional Brownian motions have become popular as flexible models in describing real-life signals of high-frequency features in geoscience, microeconomics, and turbulence, to name a few. The time-changing Hurst exponent, which describes regularity levels depending on time measurements, and variance, which relates to an energy level, are two parameters that characterize multifractional Brownian motions. This research suggests a combined method of estimating the time-changing Hurst exponent and variance using the local variation of sampled paths of signals. The method consists of two phases: initially estimating global variance and then accurately estimating the time-changing Hurst exponent. A simulation study shows its performance in estimation of the parameters. The proposed method is applied to characterization of atmospheric stability in which descriptive statistics from the estimated time-changing Hurst exponent and variance classify stable atmosphere flows from unstable ones.
Optimal design criteria - prediction vs. parameter estimation

NASA Astrophysics Data System (ADS)

Waldl, Helmut

2014-05-01

G-optimality is a popular design criterion for optimal prediction, it tries to minimize the kriging variance over the whole design region. A G-optimal design minimizes the maximum variance of all predicted values. If we use kriging methods for prediction it is self-evident to use the kriging variance as a measure of uncertainty for the estimates. Though the computation of the kriging variance and even more the computation of the empirical kriging variance is computationally very costly and finding the maximum kriging variance in high-dimensional regions can be time demanding such that we cannot really find the G-optimal design with nowadays available computer equipment in practice. We cannot always avoid this problem by using space-filling designs because small designs that minimize the empirical kriging variance are often non-space-filling. D-optimality is the design criterion related to parameter estimation. A D-optimal design maximizes the determinant of the information matrix of the estimates. D-optimality in terms of trend parameter estimation and D-optimality in terms of covariance parameter estimation yield basically different designs. The Pareto frontier of these two competing determinant criteria corresponds with designs that perform well under both criteria. Under certain conditions searching the G-optimal design on the above Pareto frontier yields almost as good results as searching the G-optimal design in the whole design region. In doing so the maximum of the empirical kriging variance has to be computed only a few times though. The method is demonstrated by means of a computer simulation experiment based on data provided by the Belgian institute Management Unit of the North Sea Mathematical Models (MUMM) that describe the evolution of inorganic and organic carbon and nutrients, phytoplankton, bacteria and zooplankton in the Southern Bight of the North Sea.
Modeling additive and non-additive effects in a hybrid population using genome-wide genotyping: prediction accuracy implications

PubMed Central

Bouvet, J-M; Makouanzi, G; Cros, D; Vigneron, Ph

2016-01-01

Hybrids are broadly used in plant breeding and accurate estimation of variance components is crucial for optimizing genetic gain. Genome-wide information may be used to explore models designed to assess the extent of additive and non-additive variance and test their prediction accuracy for the genomic selection. Ten linear mixed models, involving pedigree- and marker-based relationship matrices among parents, were developed to estimate additive (A), dominance (D) and epistatic (AA, AD and DD) effects. Five complementary models, involving the gametic phase to estimate marker-based relationships among hybrid progenies, were developed to assess the same effects. The models were compared using tree height and 3303 single-nucleotide polymorphism markers from 1130 cloned individuals obtained via controlled crosses of 13 Eucalyptus urophylla females with 9 Eucalyptus grandis males. Akaike information criterion (AIC), variance ratios, asymptotic correlation matrices of estimates, goodness-of-fit, prediction accuracy and mean square error (MSE) were used for the comparisons. The variance components and variance ratios differed according to the model. Models with a parent marker-based relationship matrix performed better than those that were pedigree-based, that is, an absence of singularities, lower AIC, higher goodness-of-fit and accuracy and smaller MSE. However, AD and DD variances were estimated with high s.es. Using the same criteria, progeny gametic phase-based models performed better in fitting the observations and predicting genetic values. However, DD variance could not be separated from the dominance variance and null estimates were obtained for AA and AD effects. This study highlighted the advantages of progeny models using genome-wide information. PMID:26328760
Development of a technique for estimating noise covariances using multiple observers

NASA Technical Reports Server (NTRS)

Bundick, W. Thomas

1988-01-01

Friedland's technique for estimating the unknown noise variances of a linear system using multiple observers has been extended by developing a general solution for the estimates of the variances, developing the statistics (mean and standard deviation) of these estimates, and demonstrating the solution on two examples.
Parameter estimation for the exponential-normal convolution model for background correction of affymetrix GeneChip data.

PubMed

McGee, Monnie; Chen, Zhongxue

2006-01-01

There are many methods of correcting microarray data for non-biological sources of error. Authors routinely supply software or code so that interested analysts can implement their methods. Even with a thorough reading of associated references, it is not always clear how requisite parts of the method are calculated in the software packages. However, it is important to have an understanding of such details, as this understanding is necessary for proper use of the output, or for implementing extensions to the model. In this paper, the calculation of parameter estimates used in Robust Multichip Average (RMA), a popular preprocessing algorithm for Affymetrix GeneChip brand microarrays, is elucidated. The background correction method for RMA assumes that the perfect match (PM) intensities observed result from a convolution of the true signal, assumed to be exponentially distributed, and a background noise component, assumed to have a normal distribution. A conditional expectation is calculated to estimate signal. Estimates of the mean and variance of the normal distribution and the rate parameter of the exponential distribution are needed to calculate this expectation. Simulation studies show that the current estimates are flawed; therefore, new ones are suggested. We examine the performance of preprocessing under the exponential-normal convolution model using several different methods to estimate the parameters.
Consistent Small-Sample Variances for Six Gamma-Family Measures of Ordinal Association

ERIC Educational Resources Information Center

Woods, Carol M.

2009-01-01

Gamma-family measures are bivariate ordinal correlation measures that form a family because they all reduce to Goodman and Kruskal's gamma in the absence of ties (1954). For several gamma-family indices, more than one variance estimator has been introduced. In previous research, the "consistent" variance estimator described by Cliff and…
Estimating the Reliability of Single-Item Life Satisfaction Measures: Results from Four National Panel Studies

ERIC Educational Resources Information Center

Lucas, Richard E.; Donnellan, M. Brent

2012-01-01

Life satisfaction is often assessed using single-item measures. However, estimating the reliability of these measures can be difficult because internal consistency coefficients cannot be calculated. Existing approaches use longitudinal data to isolate occasion-specific variance from variance that is either completely stable or variance that…
Estimation of Variance in the Case of Complex Samples.

ERIC Educational Resources Information Center

Groenewald, A. C.; Stoker, D. J.

In a complex sampling scheme it is desirable to select the primary sampling units (PSUs) without replacement to prevent duplications in the sample. Since the estimation of the sampling variances is more complicated when the PSUs are selected without replacement, L. Kish (1965) recommends that the variance be calculated using the formulas…
Genetic basis of between-individual and within-individual variance of docility.

PubMed

Martin, J G A; Pirotta, E; Petelle, M B; Blumstein, D T

2017-04-01

Between-individual variation in phenotypes within a population is the basis of evolution. However, evolutionary and behavioural ecologists have mainly focused on estimating between-individual variance in mean trait and neglected variation in within-individual variance, or predictability of a trait. In fact, an important assumption of mixed-effects models used to estimate between-individual variance in mean traits is that within-individual residual variance (predictability) is identical across individuals. Individual heterogeneity in the predictability of behaviours is a potentially important effect but rarely estimated and accounted for. We used 11 389 measures of docility behaviour from 1576 yellow-bellied marmots (Marmota flaviventris) to estimate between-individual variation in both mean docility and its predictability. We then implemented a double hierarchical animal model to decompose the variances of both mean trait and predictability into their environmental and genetic components. We found that individuals differed both in their docility and in their predictability of docility with a negative phenotypic covariance. We also found significant genetic variance for both mean docility and its predictability but no genetic covariance between the two. This analysis is one of the first to estimate the genetic basis of both mean trait and within-individual variance in a wild population. Our results indicate that equal within-individual variance should not be assumed. We demonstrate the evolutionary importance of the variation in the predictability of docility and illustrate potential bias in models ignoring variation in predictability. We conclude that the variability in the predictability of a trait should not be ignored, and present a coherent approach for its quantification. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Interpretable inference on the mixed effect model with the Box-Cox transformation.

PubMed

Maruo, K; Yamaguchi, Y; Noma, H; Gosho, M

2017-07-10

We derived results for inference on parameters of the marginal model of the mixed effect model with the Box-Cox transformation based on the asymptotic theory approach. We also provided a robust variance estimator of the maximum likelihood estimator of the parameters of this model in consideration of the model misspecifications. Using these results, we developed an inference procedure for the difference of the model median between treatment groups at the specified occasion in the context of mixed effects models for repeated measures analysis for randomized clinical trials, which provided interpretable estimates of the treatment effect. From simulation studies, it was shown that our proposed method controlled type I error of the statistical test for the model median difference in almost all the situations and had moderate or high performance for power compared with the existing methods. We illustrated our method with cluster of differentiation 4 (CD4) data in an AIDS clinical trial, where the interpretability of the analysis results based on our proposed method is demonstrated. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
The evolutionary rate dynamically tracks changes in HIV-1 epidemics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maljkovic-berry, Irina; Athreya, Gayathri; Daniels, Marcus

Large-sequence datasets provide an opportunity to investigate the dynamics of pathogen epidemics. Thus, a fast method to estimate the evolutionary rate from large and numerous phylogenetic trees becomes necessary. Based on minimizing tip height variances, we optimize the root in a given phylogenetic tree to estimate the most homogenous evolutionary rate between samples from at least two different time points. Simulations showed that the method had no bias in the estimation of evolutionary rates and that it was robust to tree rooting and topological errors. We show that the evolutionary rates of HIV-1 subtype B and C epidemics have changedmore » over time, with the rate of evolution inversely correlated to the rate of virus spread. For subtype B, the evolutionary rate slowed down and tracked the start of the HAART era in 1996. Subtype C in Ethiopia showed an increase in the evolutionary rate when the prevalence increase markedly slowed down in 1995. Thus, we show that the evolutionary rate of HIV-1 on the population level dynamically tracks epidemic events.« less
Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

PubMed Central

Guan, Yongtao; Li, Yehua; Sinha, Rajita

2011-01-01

In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
A Hybrid One-Way ANOVA Approach for the Robust and Efficient Estimation of Differential Gene Expression with Multiple Patterns

PubMed Central

Mollah, Mohammad Manir Hossain; Jamal, Rahman; Mokhtar, Norfilza Mohd; Harun, Roslan; Mollah, Md. Nurul Haque

2015-01-01

Background Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression. Results The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA. Conclusion Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression. PMID:26413858
Using high-resolution variant frequencies to empower clinical genome interpretation.

PubMed

Whiffin, Nicola; Minikel, Eric; Walsh, Roddy; O'Donnell-Luria, Anne H; Karczewski, Konrad; Ing, Alexander Y; Barton, Paul J R; Funke, Birgit; Cook, Stuart A; MacArthur, Daniel; Ware, James S

2017-10-01

PurposeWhole-exome and whole-genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognized as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants.MethodsWe present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets.ResultsUsing the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, without removing true pathogenic variants (false-positive rate<0.001).ConclusionWe outline a statistically robust framework for assessing whether a variant is "too common" to be causative for a Mendelian disorder of interest. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

PubMed

Tong, Xiaoxiao; Bentler, Peter M

2013-01-01

Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
On the additive and dominant variance and covariance of individuals within the genomic selection scope.

PubMed

Vitezica, Zulma G; Varona, Luis; Legarra, Andres

2013-12-01

Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.
Age-specific survival of male golden-cheeked warblers on the Fort Hood Military Reservation, Texas

USGS Publications Warehouse

Duarte, Adam; Hines, James E.; Nichols, James D.; Hatfield, Jeffrey S.; Weckerly, Floyd W.

2014-01-01

Population models are essential components of large-scale conservation and management plans for the federally endangered Golden-cheeked Warbler (Setophaga chrysoparia; hereafter GCWA). However, existing models are based on vital rate estimates calculated using relatively small data sets that are now more than a decade old. We estimated more current, precise adult and juvenile apparent survival (Φ) probabilities and their associated variances for male GCWAs. In addition to providing estimates for use in population modeling, we tested hypotheses about spatial and temporal variation in Φ. We assessed whether a linear trend in Φ or a change in the overall mean Φ corresponded to an observed increase in GCWA abundance during 1992-2000 and if Φ varied among study plots. To accomplish these objectives, we analyzed long-term GCWA capture-resight data from 1992 through 2011, collected across seven study plots on the Fort Hood Military Reservation using a Cormack-Jolly-Seber model structure within program MARK. We also estimated Φ process and sampling variances using a variance-components approach. Our results did not provide evidence of site-specific variation in adult Φ on the installation. Because of a lack of data, we could not assess whether juvenile Φ varied spatially. We did not detect a strong temporal association between GCWA abundance and Φ. Mean estimates of Φ for adult and juvenile male GCWAs for all years analyzed were 0.47 with a process variance of 0.0120 and a sampling variance of 0.0113 and 0.28 with a process variance of 0.0076 and a sampling variance of 0.0149, respectively. Although juvenile Φ did not differ greatly from previous estimates, our adult Φ estimate suggests previous GCWA population models were overly optimistic with respect to adult survival. These updated Φ probabilities and their associated variances will be incorporated into new population models to assist with GCWA conservation decision making.
A simple and exploratory way to determine the mean-variance relationship in generalized linear models.

PubMed

Tsou, Tsung-Shan

2007-03-30

This paper introduces an exploratory way to determine how variance relates to the mean in generalized linear models. This novel method employs the robust likelihood technique introduced by Royall and Tsou.A urinary data set collected by Ginsberg et al. and the fabric data set analysed by Lee and Nelder are considered to demonstrate the applicability and simplicity of the proposed technique. Application of the proposed method could easily reveal a mean-variance relationship that would generally be left unnoticed, or that would require more complex modelling to detect. Copyright (c) 2006 John Wiley & Sons, Ltd.

Empirical single sample quantification of bias and variance in Q-ball imaging.

PubMed

Hainline, Allison E; Nath, Vishwesh; Parvathaneni, Prasanna; Blaber, Justin A; Schilling, Kurt G; Anderson, Adam W; Kang, Hakmook; Landman, Bennett A

2018-02-06

The bias and variance of high angular resolution diffusion imaging methods have not been thoroughly explored in the literature and may benefit from the simulation extrapolation (SIMEX) and bootstrap techniques to estimate bias and variance of high angular resolution diffusion imaging metrics. The SIMEX approach is well established in the statistics literature and uses simulation of increasingly noisy data to extrapolate back to a hypothetical case with no noise. The bias of calculated metrics can then be computed by subtracting the SIMEX estimate from the original pointwise measurement. The SIMEX technique has been studied in the context of diffusion imaging to accurately capture the bias in fractional anisotropy measurements in DTI. Herein, we extend the application of SIMEX and bootstrap approaches to characterize bias and variance in metrics obtained from a Q-ball imaging reconstruction of high angular resolution diffusion imaging data. The results demonstrate that SIMEX and bootstrap approaches provide consistent estimates of the bias and variance of generalized fractional anisotropy, respectively. The RMSE for the generalized fractional anisotropy estimates shows a 7% decrease in white matter and an 8% decrease in gray matter when compared with the observed generalized fractional anisotropy estimates. On average, the bootstrap technique results in SD estimates that are approximately 97% of the true variation in white matter, and 86% in gray matter. Both SIMEX and bootstrap methods are flexible, estimate population characteristics based on single scans, and may be extended for bias and variance estimation on a variety of high angular resolution diffusion imaging metrics. © 2018 International Society for Magnetic Resonance in Medicine.
Capability and Development Risk Management in System-of-Systems Architectures: A Portfolio Approach to Decision-Making

DTIC Science & Technology

2012-04-30

tool that provides a means of balancing capability development against cost and interdependent risks through the use of modern portfolio theory ...Focardi, 2007; Tutuncu & Cornuejols, 2007) that are extensions of modern portfolio and control theory . The reformulation allows for possible changes...Acquisition: Wave Model context • An Investment Portfolio Approach – Mean Variance Approach – Mean - Variance : A Robust Version • Concept
Identification of multiple leaks in pipeline: Linearized model, maximum likelihood, and super-resolution localization

NASA Astrophysics Data System (ADS)

Wang, Xun; Ghidaoui, Mohamed S.

2018-07-01

This paper considers the problem of identifying multiple leaks in a water-filled pipeline based on inverse transient wave theory. The analytical solution to this problem involves nonlinear interaction terms between the various leaks. This paper shows analytically and numerically that these nonlinear terms are of the order of the leak sizes to the power two and; thus, negligible. As a result of this simplification, a maximum likelihood (ML) scheme that identifies leak locations and leak sizes separately is formulated and tested. It is found that the ML estimation scheme is highly efficient and robust with respect to noise. In addition, the ML method is a super-resolution leak localization scheme because its resolvable leak distance (approximately 0.15λmin , where λmin is the minimum wavelength) is below the Nyquist-Shannon sampling theorem limit (0.5λmin). Moreover, the Cramér-Rao lower bound (CRLB) is derived and used to show the efficiency of the ML scheme estimates. The variance of the ML estimator approximates the CRLB proving that the ML scheme belongs to class of best unbiased estimator of leak localization methods.
Evidence for skipped spawning in a potamodromous cyprinid, humpback chub (Gila cypha), with implications for demographic parameter estimates

USGS Publications Warehouse

Pearson, Kristen Nicole; Kendall, William L.; Winkelman, Dana L.; Persons, William R.

2015-01-01

Our findings reveal evidence for skipped spawning in a potamodromous cyprinid, humpback chub (HBC; Gila cypha ). Using closed robust design mark-recapture models, we found, on average, spawning HBC transition to the skipped spawning state () with a probability of 0.45 (95% CRI (i.e. credible interval): 0.10, 0.80) and skipped spawners remain in the skipped spawning state () with a probability of 0.60 (95% CRI: 0.26, 0.83), yielding an average spawning cycle of every 2.12 years, conditional on survival. As a result, migratory skipped spawners are unavailable for detection during annual sampling events. If availability is unaccounted for, survival and detection probability estimates will be biased. Therefore, we estimated annual adult survival probability (S), while accounting for skipped spawning, and found S remained reasonably stable throughout the study period, with an average of 0.75 ((95% CRI: 0.66, 0.82), process varianceσ2 = 0.005), while skipped spawning probability was highly dynamic (σ2 = 0.306). By improving understanding of HBC spawning strategies, conservation decisions can be based on less biased estimates of survival and a more informed population model structure.
Subsurface attenuation estimation using a novel hybrid method based on FWE function and power spectrum

NASA Astrophysics Data System (ADS)

Li, Jingnan; Wang, Shangxu; Yang, Dengfeng; Tang, Genyang; Chen, Yangkang

2018-02-01

Seismic waves propagating in the subsurface suffer from attenuation, which can be represented by the quality factor Q. Knowledge of Q plays a vital role in hydrocarbon exploration. Many methods to measure Q have been proposed, among which the central frequency shift (CFS) and the peak frequency shift (PFS) are commonly used. However, both methods are under the assumption of a particular shape for amplitude spectra, which will cause systematic error in Q estimation. Recently a new method to estimate Q has been proposed to overcome this disadvantage by using frequency weighted exponential (FWE) function to fit amplitude spectra of different shapes. In the FWE method, a key procedure is to calculate the central frequency and variance of the amplitude spectrum. However, the amplitude spectrum is susceptible to noise, whereas the power spectrum is less sensitive to random noise and has better anti-noise performance. To enhance the robustness of the FWE method, we propose a novel hybrid method by combining the advantage of the FWE method and the power spectrum, which is called the improved FWE method (IFWE). The basic idea is to consider the attenuation of the power spectrum instead of the amplitude spectrum and to use a modified FWE function to fit power spectra, according to which we derive a new Q estimation formula. Tests of noisy synthetic data show that the IFWE are more robust than the FWE. Moreover, the frequency bandwidth selection in the IFWE can be more flexible than that in the FWE. The application to field vertical seismic profile data and surface seismic data further demonstrates its validity.
Combining Study Outcome Measures Using Dominance Adjusted Weights

ERIC Educational Resources Information Center

Makambi, Kepher H.; Lu, Wenxin

2013-01-01

Weighting of studies in meta-analysis is usually implemented by using the estimated inverse variances of treatment effect estimates. However, there is a possibility of one study dominating other studies in the estimation process by taking on a weight that is above some upper limit. We implement an estimator of the heterogeneity variance that takes…
Estimation of genetic parameters for milk yield in Murrah buffaloes by Bayesian inference.

PubMed

Breda, F C; Albuquerque, L G; Euclydes, R F; Bignardi, A B; Baldi, F; Torres, R A; Barbosa, L; Tonhati, H

2010-02-01

Random regression models were used to estimate genetic parameters for test-day milk yield in Murrah buffaloes using Bayesian inference. Data comprised 17,935 test-day milk records from 1,433 buffaloes. Twelve models were tested using different combinations of third-, fourth-, fifth-, sixth-, and seventh-order orthogonal polynomials of weeks of lactation for additive genetic and permanent environmental effects. All models included the fixed effects of contemporary group, number of daily milkings and age of cow at calving as covariate (linear and quadratic effect). In addition, residual variances were considered to be heterogeneous with 6 classes of variance. Models were selected based on the residual mean square error, weighted average of residual variance estimates, and estimates of variance components, heritabilities, correlations, eigenvalues, and eigenfunctions. Results indicated that changes in the order of fit for additive genetic and permanent environmental random effects influenced the estimation of genetic parameters. Heritability estimates ranged from 0.19 to 0.31. Genetic correlation estimates were close to unity between adjacent test-day records, but decreased gradually as the interval between test-days increased. Results from mean squared error and weighted averages of residual variance estimates suggested that a model considering sixth- and seventh-order Legendre polynomials for additive and permanent environmental effects, respectively, and 6 classes for residual variances, provided the best fit. Nevertheless, this model presented the largest degree of complexity. A more parsimonious model, with fourth- and sixth-order polynomials, respectively, for these same effects, yielded very similar genetic parameter estimates. Therefore, this last model is recommended for routine applications. Copyright 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
SU-F-R-40: Robustness Test of Computed Tomography Textures of Lung Tissues to Varying Scanning Protocols Using a Realistic Phantom Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, S; Markel, D; Hegyi, G

2016-06-15

Purpose: The reliability of computed tomography (CT) textures is an important element of radiomics analysis. This study investigates the dependency of lung CT textures on different breathing phases and changes in CT image acquisition protocols in a realistic phantom setting. Methods: We investigated 11 CT texture features for radiation-induced lung disease from 3 categories (first-order, grey level co-ocurrence matrix (GLCM), and Law’s filter). A biomechanical swine lung phantom was scanned at two breathing phases (inhale/exhale) and two scanning protocols set for PET/CT and diagnostic CT scanning. Lung volumes acquired from the CT images were divided into 2-dimensional sub-regions with amore » grid spacing of 31 mm. The distribution of the evaluated texture features from these sub-regions were compared between the two scanning protocols and two breathing phases. The significance of each factor on feature values were tested at 95% significance level using analysis of covariance (ANCOVA) model with interaction terms included. Robustness of a feature to a scanning factor was defined as non-significant dependence on the factor. Results: Three GLCM textures (variance, sum entropy, difference entropy) were robust to breathing changes. Two GLCM (variance, sum entropy) and 3 Law’s filter textures (S5L5, E5L5, W5L5) were robust to scanner changes. Moreover, the two GLCM textures (variance, sum entropy) were consistent across all 4 scanning conditions. First-order features, especially Hounsfield unit intensity features, presented the most drastic variation up to 39%. Conclusion: Amongst the studied features, GLCM and Law’s filter texture features were more robust than first-order features. However, the majority of the features were modified by either breathing phase or scanner changes, suggesting a need for calibration when retrospectively comparing scans obtained at different conditions. Further investigation is necessary to identify the sensitivity of individual image acquisition parameters.« less
Workers' compensation costs among construction workers: a robust regression analysis.

PubMed

Friedman, Lee S; Forst, Linda S

2009-11-01

Workers' compensation data are an important source for evaluating costs associated with construction injuries. We describe the characteristics of injured construction workers filing claims in Illinois between 2000 and 2005 and the factors associated with compensation costs using a robust regression model. In the final multivariable model, the cumulative percent temporary and permanent disability-measures of severity of injury-explained 38.7% of the variance of cost. Attorney costs explained only 0.3% of the variance of the dependent variable. The model used in this study clearly indicated that percent disability was the most important determinant of cost, although the method and uniformity of percent impairment allocation could be better elucidated. There is a need to integrate analytical methods that are suitable for skewed data when analyzing claim costs.
Kappa statistic for clustered matched-pair data.

PubMed

Yang, Zhao; Zhou, Ming

2014-07-10

Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
On the error in crop acreage estimation using satellite (LANDSAT) data

NASA Technical Reports Server (NTRS)

Chhikara, R. (Principal Investigator)

1983-01-01

The problem of crop acreage estimation using satellite data is discussed. Bias and variance of a crop proportion estimate in an area segment obtained from the classification of its multispectral sensor data are derived as functions of the means, variances, and covariance of error rates. The linear discriminant analysis and the class proportion estimation for the two class case are extended to include a third class of measurement units, where these units are mixed on ground. Special attention is given to the investigation of mislabeling in training samples and its effect on crop proportion estimation. It is shown that the bias and variance of the estimate of a specific crop acreage proportion increase as the disparity in mislabeling rates between two classes increases. Some interaction is shown to take place, causing the bias and the variance to decrease at first and then to increase, as the mixed unit class varies in size from 0 to 50 percent of the total area segment.
Data-Adaptive Bias-Reduced Doubly Robust Estimation.

PubMed

Vermeulen, Karel; Vansteelandt, Stijn

2016-05-01

Doubly robust estimators have now been proposed for a variety of target parameters in the causal inference and missing data literature. These consistently estimate the parameter of interest under a semiparametric model when one of two nuisance working models is correctly specified, regardless of which. The recently proposed bias-reduced doubly robust estimation procedure aims to partially retain this robustness in more realistic settings where both working models are misspecified. These so-called bias-reduced doubly robust estimators make use of special (finite-dimensional) nuisance parameter estimators that are designed to locally minimize the squared asymptotic bias of the doubly robust estimator in certain directions of these finite-dimensional nuisance parameters under misspecification of both parametric working models. In this article, we extend this idea to incorporate the use of data-adaptive estimators (infinite-dimensional nuisance parameters), by exploiting the bias reduction estimation principle in the direction of only one nuisance parameter. We additionally provide an asymptotic linearity theorem which gives the influence function of the proposed doubly robust estimator under correct specification of a parametric nuisance working model for the missingness mechanism/propensity score but a possibly misspecified (finite- or infinite-dimensional) outcome working model. Simulation studies confirm the desirable finite-sample performance of the proposed estimators relative to a variety of other doubly robust estimators.
Procedures for estimating confidence intervals for selected method performance parameters.

PubMed

McClure, F D; Lee, J K

2001-01-01

Procedures for estimating confidence intervals (CIs) for the repeatability variance (sigmar2), reproducibility variance (sigmaR2 = sigmaL2 + sigmar2), laboratory component (sigmaL2), and their corresponding standard deviations sigmar, sigmaR, and sigmaL, respectively, are presented. In addition, CIs for the ratio of the repeatability component to the reproducibility variance (sigmar2/sigmaR2) and the ratio of the laboratory component to the reproducibility variance (sigmaL2/sigmaR2) are also presented.
An Alternative to Cohen's Standardized Mean Difference Effect Size: A Robust Parameter and Confidence Interval in the Two Independent Groups Case

ERIC Educational Resources Information Center

Algina, James; Keselman, H. J.; Penfield, Randall D.

2005-01-01

The authors argue that a robust version of Cohen's effect size constructed by replacing population means with 20% trimmed means and the population standard deviation with the square root of a 20% Winsorized variance is a better measure of population separation than is Cohen's effect size. The authors investigated coverage probability for…
Errors in the estimation of the variance: implications for multiple-probability fluctuation analysis.

PubMed

Saviane, Chiara; Silver, R Angus

2006-06-15

Synapses play a crucial role in information processing in the brain. Amplitude fluctuations of synaptic responses can be used to extract information about the mechanisms underlying synaptic transmission and its modulation. In particular, multiple-probability fluctuation analysis can be used to estimate the number of functional release sites, the mean probability of release and the amplitude of the mean quantal response from fits of the relationship between the variance and mean amplitude of postsynaptic responses, recorded at different probabilities. To determine these quantal parameters, calculate their uncertainties and the goodness-of-fit of the model, it is important to weight the contribution of each data point in the fitting procedure. We therefore investigated the errors associated with measuring the variance by determining the best estimators of the variance of the variance and have used simulations of synaptic transmission to test their accuracy and reliability under different experimental conditions. For central synapses, which generally have a low number of release sites, the amplitude distribution of synaptic responses is not normal, thus the use of a theoretical variance of the variance based on the normal assumption is not a good approximation. However, appropriate estimators can be derived for the population and for limited sample sizes using a more general expression that involves higher moments and introducing unbiased estimators based on the h-statistics. Our results are likely to be relevant for various applications of fluctuation analysis when few channels or release sites are present.
Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study.

PubMed

Kim, Minjung; Lamont, Andrea E; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M Lee

2016-06-01

Regression mixture models are a novel approach to modeling the heterogeneous effects of predictors on an outcome. In the model-building process, often residual variances are disregarded and simplifying assumptions are made without thorough examination of the consequences. In this simulation study, we investigated the impact of an equality constraint on the residual variances across latent classes. We examined the consequences of constraining the residual variances on class enumeration (finding the true number of latent classes) and on the parameter estimates, under a number of different simulation conditions meant to reflect the types of heterogeneity likely to exist in applied analyses. The results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted on the estimated class sizes and showed the potential to greatly affect the parameter estimates in each class. These results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions are made.
The Influence of Major Life Events on Economic Attitudes in a World of Gene-Environment Interplay.

PubMed

Hatemi, Peter K

2013-10-01

The role of "genes" on political attitudes has gained attention across disciplines. However, person-specific experiences have yet to be incorporated into models that consider genetic influences. Relying on a gene-environment interplay approach, this study explicates how life-events, such as losing one's job or suffering a financial loss, influence economic policy attitudes. The results indicate genetic and environmental variance on support for unions, immigration, capitalism, socialism and property tax is moderated by financial risks. Changes in the magnitude of genetic influences, however, are temporary. After two years, the phenotypic effects of the life events remain on most attitudes, but changes in the sources of individual differences do not. Univariate twin models that estimate the independent contributions of genes and environment on the variation of attitudes appear to provide robust baseline indicators of sources of individual differences. These estimates, however, are not event or day specific. In this way, genetic influences add stability, while environment cues change, and this process is continually updated.
Approximate Confidence Intervals for Moment-Based Estimators of the Between-Study Variance in Random Effects Meta-Analysis

ERIC Educational Resources Information Center

Jackson, Dan; Bowden, Jack; Baker, Rose

2015-01-01

Moment-based estimators of the between-study variance are very popular when performing random effects meta-analyses. This type of estimation has many advantages including computational and conceptual simplicity. Furthermore, by using these estimators in large samples, valid meta-analyses can be performed without the assumption that the treatment…
Biochemical phenotypes to discriminate microbial subpopulations and improve outbreak detection.

PubMed

Galar, Alicia; Kulldorff, Martin; Rudnick, Wallis; O'Brien, Thomas F; Stelling, John

2013-01-01

Clinical microbiology laboratories worldwide constitute an invaluable resource for monitoring emerging threats and the spread of antimicrobial resistance. We studied the growing number of biochemical tests routinely performed on clinical isolates to explore their value as epidemiological markers. Microbiology laboratory results from January 2009 through December 2011 from a 793-bed hospital stored in WHONET were examined. Variables included patient location, collection date, organism, and 47 biochemical and 17 antimicrobial susceptibility test results reported by Vitek 2. To identify biochemical tests that were particularly valuable (stable with repeat testing, but good variability across the species) or problematic (inconsistent results with repeat testing), three types of variance analyses were performed on isolates of K. pneumonia: descriptive analysis of discordant biochemical results in same-day isolates, an average within-patient variance index, and generalized linear mixed model variance component analysis. 4,200 isolates of K. pneumoniae were identified from 2,485 patients, 32% of whom had multiple isolates. The first two variance analyses highlighted SUCT, TyrA, GlyA, and GGT as "nuisance" biochemicals for which discordant within-patient test results impacted a high proportion of patient results, while dTAG had relatively good within-patient stability with good heterogeneity across the species. Variance component analyses confirmed the relative stability of dTAG, and identified additional biochemicals such as PHOS with a large between patient to within patient variance ratio. A reduced subset of biochemicals improved the robustness of strain definition for carbapenem-resistant K. pneumoniae. Surveillance analyses suggest that the reduced biochemical profile could improve the timeliness and specificity of outbreak detection algorithms. The statistical approaches explored can improve the robust recognition of microbial subpopulations with routinely available biochemical test results, of value in the timely detection of outbreak clones and evolutionarily important genetic events.
A Robust High-Accuracy Ultrasound Indoor Positioning System Based on a Wireless Sensor Network.

PubMed

Qi, Jun; Liu, Guo-Ping

2017-11-06

This paper describes the development and implementation of a robust high-accuracy ultrasonic indoor positioning system (UIPS). The UIPS consists of several wireless ultrasonic beacons in the indoor environment. Each of them has a fixed and known position coordinate and can collect all the transmissions from the target node or emit ultrasonic signals. Every wireless sensor network (WSN) node has two communication modules: one is WiFi, that transmits the data to the server, and the other is the radio frequency (RF) module, which is only used for time synchronization between different nodes, with accuracy up to 1 μ s. The distance between the beacon and the target node is calculated by measuring the time-of-flight (TOF) for the ultrasonic signal, and then the position of the target is computed by some distances and the coordinate of the beacons. TOF estimation is the most important technique in the UIPS. A new time domain method to extract the envelope of the ultrasonic signals is presented in order to estimate the TOF. This method, with the envelope detection filter, estimates the value with the sampled values on both sides based on the least squares method (LSM). The simulation results show that the method can achieve envelope detection with a good filtering effect by means of the LSM. The highest precision and variance can reach 0.61 mm and 0.23 mm, respectively, in pseudo-range measurements with UIPS. A maximum location error of 10.2 mm is achieved in the positioning experiments for a moving robot, when UIPS works on the line-of-sight (LOS) signal.

Trends in Elevated Triglyceride in Adults: United States, 2001-2012

MedlinePlus

... All variance estimates accounted for the complex survey design using Taylor series linearization ( 10 ). Percentage estimates for the total adult ... al. National Health and Nutrition Examination Survey: Sample design, 2007–2010. ... KM. Taylor series methods. In: Introduction to variance estimation. 2nd ed. ...
Robust portfolio selection based on asymmetric measures of variability of stock returns

NASA Astrophysics Data System (ADS)

Chen, Wei; Tan, Shaohua

2009-10-01

This paper addresses a new uncertainty set--interval random uncertainty set for robust optimization. The form of interval random uncertainty set makes it suitable for capturing the downside and upside deviations of real-world data. These deviation measures capture distributional asymmetry and lead to better optimization results. We also apply our interval random chance-constrained programming to robust mean-variance portfolio selection under interval random uncertainty sets in the elements of mean vector and covariance matrix. Numerical experiments with real market data indicate that our approach results in better portfolio performance.
Evaluation of three lidar scanning strategies for turbulence measurements

NASA Astrophysics Data System (ADS)

Newman, J. F.; Klein, P. M.; Wharton, S.; Sathe, A.; Bonin, T. A.; Chilson, P. B.; Muschinski, A.

2015-11-01

Several errors occur when a traditional Doppler-beam swinging (DBS) or velocity-azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers. Results indicate that the six-beam strategy mitigates some of the errors caused by VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.
Evaluation of three lidar scanning strategies for turbulence measurements

NASA Astrophysics Data System (ADS)

Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia; Sathe, Ameya; Bonin, Timothy A.; Chilson, Phillip B.; Muschinski, Andreas

2016-05-01

Several errors occur when a traditional Doppler beam swinging (DBS) or velocity-azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused by VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.
Estimating acreage by double sampling using LANDSAT data

NASA Technical Reports Server (NTRS)

Pont, F.; Horwitz, H.; Kauth, R. (Principal Investigator)

1982-01-01

Double sampling techniques employing LANDSAT data for estimating the acreage of corn and soybeans was investigated and evaluated. The evaluation was based on estimated costs and correlations between two existing procedures having differing cost/variance characteristics, and included consideration of their individual merits when coupled with a fictional 'perfect' procedure of zero bias and variance. Two features of the analysis are: (1) the simultaneous estimation of two or more crops; and (2) the imposition of linear cost constraints among two or more types of resource. A reasonably realistic operational scenario was postulated. The costs were estimated from current experience with the measurement procedures involved, and the correlations were estimated from a set of 39 LACIE-type sample segments located in the U.S. Corn Belt. For a fixed variance of the estimate, double sampling with the two existing LANDSAT measurement procedures can result in a 25% or 50% cost reduction. Double sampling which included the fictional perfect procedure results in a more cost effective combination when it is used with the lower cost/higher variance representative of the existing procedures.
Modeling Multiplicative Error Variance: An Example Predicting Tree Diameter from Stump Dimensions in Baldcypress

Treesearch

Bernard R. Parresol

1993-01-01

In the context of forest modeling, it is often reasonable to assume a multiplicative heteroscedastic error structure to the data. Under such circumstances ordinary least squares no longer provides minimum variance estimates of the model parameters. Through study of the error structure, a suitable error variance model can be specified and its parameters estimated. This...
Unbiased Estimates of Variance Components with Bootstrap Procedures

ERIC Educational Resources Information Center

Brennan, Robert L.

2007-01-01

This article provides general procedures for obtaining unbiased estimates of variance components for any random-model balanced design under any bootstrap sampling plan, with the focus on designs of the type typically used in generalizability theory. The results reported here are particularly helpful when the bootstrap is used to estimate standard…
Control Variate Estimators of Survivor Growth from Point Samples

Treesearch

Francis A. Roesch; Paul C. van Deusen

1993-01-01

Two estimators of the control variate type for survivor growth from remeasured point samples are proposed and compared with more familiar estimators. The large reductionsin variance, observed in many cases forestimators constructed with control variates, arealso realized in thisapplication. A simulation study yielded consistent reductions in variance which were often...
Internalized Homophobia as a Partial Mediator between Homophobic Bullying and Self-Esteem among Sexual Minority Youths in Quebec (Canada)

PubMed Central

Blais, Martin; Gervais, Jesse; Hébert, Martine

2016-01-01

Verbal/psychological homophobic bullying is widespread among sexual minority youths. Homophobic bullying has been associated with both high internalized homophobia and low self-esteem. The objectives were to document verbal/psychological homophobic bullying among sexual minority youths and to model the relationships between homophobic bullying, internalized homophobia and self-esteem. Method: A community sample of 300 sexual minority youths aged 14 to 22 years old was used. A structural equations model was tested using a nonlinear, robust estimator implemented in Mplus. The model postulated that homophobic bullying impacts self-esteem both directly and indirectly, via internalized homophobia. Results: 60.7% of the sample reported at least on form of verbal/psychological homophobic bullying. The model explained 29% of the variance of self-esteem, 19.6% of the variance of internalized homophobia and 5.3% of the verbal/psychological homophobic bullying. The model suggests that the relationship between verbal/psychological homophobic bullying and self-esteem is partially mediated by internalized homophobia. Conclusion: Our results underscore the importance of initiatives to prevent homophobic bullying in order to prevent its negative effects on well-being of sexual minority youths. PMID:24714888
Lead Determination and Heterogeneity Analysis in Soil from a Former Firing Range

NASA Astrophysics Data System (ADS)

Urrutia-Goyes, Ricardo; Argyraki, Ariadne; Ornelas-Soto, Nancy

2017-07-01

Public places can have an unknown past of pollutants deposition. The exposition to such contaminants can create environmental and health issues. The characterization of a former firing range in Athens, Greece will allow its monitoring and encourage its remediation. This study is focused on Pb contamination in the site due to its presence in ammunition. A dense sampling design with 91 location (10 m apart) was used to determine the spatial distribution of the element in the surface soil of the study area. Duplicates samples were also collected one meter apart from 8 random locations to estimate the heterogeneity of the site. Elemental concentrations were measured using a portable XRF device after simple sample homogenization in the field. Robust Analysis of Variance showed that the contributions to the total variance were 11% from sampling, 1% analytical, and 88% geochemical; reflecting the suitability of the technique. Moreover, the extended random uncertainty relative to the mean concentration was 91.5%; confirming the high heterogeneity of the site. Statistical analysis defined a very high contamination in the area yielding to suggest the need for more in-depth analysis of other contaminants and possible health risks.
Internalized homophobia as a partial mediator between homophobic bullying and self-esteem among youths of sexual minorities in Quebec (Canada).

PubMed

Blais, Martin; Gervais, Jesse; Hébert, Martine

2014-03-01

Verbal/psychological homophobic bullying is widespread among youths of sexual minorities. Homophobic bullying has been associated with both high internalized homophobia and low self-esteem. The objectives were to document verbal/psychological homophobic bullying among youths of sexual minorities and model the relationships between homophobic bullying, internalized homophobia and self-esteem. A community sample of 300 youths of sexual minorities aged 14 to 22 years old was used. A structural equation model was tested using a nonlinear, robust estimator implemented in Mplus. The model postulated that homophobic bullying impacts self-esteem both directly and indirectly, via internalized homophobia. 60.7% of the sample reported at least one form of verbal/psychological homophobic bullying. The model explained 29% of the variance of self-esteem, 19.6% of the variance of internalized homophobia and 5.3% of the verbal/psychological homophobic bullying. The model suggests that the relationship between verbal/psychological homophobic bullying and self-esteem is partially mediated by internalized homophobia. The results underscore the importance of initiatives to prevent homophobic bullying in order to prevent its negative effects on the well-being of youths of sexual minorities.
Seasonal and Interannual Variation of Currents and Water Properties off the Mid-East Coast of Korea

NASA Astrophysics Data System (ADS)

Park, J. H.; Chang, K. I.; Nam, S.

2016-02-01

Since 1999, physical parameters such as current, temperature, and salinity off the mid-east coast of Korea have been continuously observed from the long-term buoy station called `East-Sea Real-time Ocean monitoring Buoy (ESROB)'. Applying harmonic analysis to 6-year-long (2007-2012) depth-averaged current data from the ESROB, a mean seasonal cycle of alongshore currents, characterized by poleward current in average and equatorward current in summer, is extracted which accounts for 5.8% of the variance of 40 hours low-pass filtered currents. In spite of the small variance explained, a robust seasonality of summertime equatorward reversal typifies the low-passed alongshore currents along with low-density water. To reveal the dynamics underlying the seasonal variation, each term of linearized, depth-averaged momentum equations is estimated using the data from ESROB, adjacent tide gauge stations, and serial hydrographic stations. The result indicates that the reversal of alongshore pressure gradient is a major driver of the equatorward reversals in summer. The reanalysis wind product (MERRA) and satellite altimeter-derived sea surface height (AVISO) data show correlated features between positive (negative) wind stress curl and sea surface depression (uplift). Quantitative estimates reveal that the wind-stress curl accounts for 42% of alongshore sea level variation. Summertime low-density water originating from the northern coastal region is a footprint of the buoyancy-driven equatorward current. An interannual variation (anomalies from the mean seasonal cycle) of alongshore currents and its possible driving mechanisms will be discussed.
Deletion Diagnostics for the Generalised Linear Mixed Model with independent random effects

PubMed Central

Ganguli, B.; Roy, S. Sen; Naskar, M.; Malloy, E. J.; Eisen, E. A.

2015-01-01

The Generalised Linear Mixed Model (GLMM) is widely used for modelling environmental data. However, such data are prone to influential observations which can distort the estimated exposure-response curve particularly in regions of high exposure. Deletion diagnostics for iterative estimation schemes commonly derive the deleted estimates based on a single iteration of the full system holding certain pivotal quantities such as the information matrix to be constant. In this paper, we present an approximate formula for the deleted estimates and Cook’s distance for the GLMM which does not assume that the estimates of variance parameters are unaffected by deletion. The procedure allows the user to calculate standardised DFBETAs for mean as well as variance parameters. In certain cases, such as when using the GLMM as a device for smoothing, such residuals for the variance parameters are interesting in their own right. In general, the procedure leads to deleted estimates of mean parameters which are corrected for the effect of deletion on variance components as estimation of the two sets of parameters is interdependent. The probabilistic behaviour of these residuals is investigated and a simulation based procedure suggested for their standardisation. The method is used to identify influential individuals in an occupational cohort exposed to silica. The results show that failure to conduct post model fitting diagnostics for variance components can lead to erroneous conclusions about the fitted curve and unstable confidence intervals. PMID:26626135
Statistical modelling of thermal annealing of fission tracks in apatite

NASA Astrophysics Data System (ADS)

Laslett, G. M.; Galbraith, R. F.

1996-12-01

We develop an improved methodology for modelling the relationship between mean track length, temperature, and time in fission track annealing experiments. We consider "fanning Arrhenius" models, in which contours of constant mean length on an Arrhenius plot are straight lines meeting at a common point. Features of our approach are explicit use of subject matter knowledge, treating mean length as the response variable, modelling of the mean-variance relationship with two components of variance, improved modelling of the control sample, and using information from experiments in which no tracks are seen. This approach overcomes several weaknesses in previous models and provides a robust six parameter model that is widely applicable. Estimation is via direct maximum likelihood which can be implemented using a standard numerical optimisation package. Because the model is highly nonlinear, some reparameterisations are needed to achieve stable estimation and calculation of precisions. Experience suggests that precisions are more convincingly estimated from profile log-likelihood functions than from the information matrix. We apply our method to the B-5 and Sr fluorapatite data of Crowley et al. (1991) and obtain well-fitting models in both cases. For the B-5 fluorapatite, our model exhibits less fanning than that of Crowley et al. (1991), although fitted mean values above 12 μm are fairly similar. However, predictions can be different, particularly for heavy annealing at geological time scales, where our model is less retentive. In addition, the refined error structure of our model results in tighter prediction errors, and has components of error that are easier to verify or modify. For the Sr fluorapatite, our fitted model for mean lengths does not differ greatly from that of Crowley et al. (1991), but our error structure is quite different.
A New Method to Compare Statistical Tree Growth Curves: The PL-GMANOVA Model and Its Application with Dendrochronological Data

PubMed Central

Ricker, Martin; Peña Ramírez, Víctor M.; von Rosen, Dietrich

2014-01-01

Growth curves are monotonically increasing functions that measure repeatedly the same subjects over time. The classical growth curve model in the statistical literature is the Generalized Multivariate Analysis of Variance (GMANOVA) model. In order to model the tree trunk radius (r) over time (t) of trees on different sites, GMANOVA is combined here with the adapted PL regression model Q = A·T+E, where for and for , A = initial relative growth to be estimated, , and E is an error term for each tree and time point. Furthermore, Ei[–b·r] = , , with TPR being the turning point radius in a sigmoid curve, and at is an estimated calibrating time-radius point. Advantages of the approach are that growth rates can be compared among growth curves with different turning point radiuses and different starting points, hidden outliers are easily detectable, the method is statistically robust, and heteroscedasticity of the residuals among time points is allowed. The model was implemented with dendrochronological data of 235 Pinus montezumae trees on ten Mexican volcano sites to calculate comparison intervals for the estimated initial relative growth . One site (at the Popocatépetl volcano) stood out, with being 3.9 times the value of the site with the slowest-growing trees. Calculating variance components for the initial relative growth, 34% of the growth variation was found among sites, 31% among trees, and 35% over time. Without the Popocatépetl site, the numbers changed to 7%, 42%, and 51%. Further explanation of differences in growth would need to focus on factors that vary within sites and over time. PMID:25402427
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data

PubMed Central

Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark

2010-01-01

Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
On the estimation variance for the specific Euler-Poincaré characteristic of random networks.

PubMed

Tscheschel, A; Stoyan, D

2003-07-01

The specific Euler number is an important topological characteristic in many applications. It is considered here for the case of random networks, which may appear in microscopy either as primary objects of investigation or as secondary objects describing in an approximate way other structures such as, for example, porous media. For random networks there is a simple and natural estimator of the specific Euler number. For its estimation variance, a simple Poisson approximation is given. It is based on the general exact formula for the estimation variance. In two examples of quite different nature and topology application of the formulas is demonstrated.
An empirical analysis of the distribution of overshoots in a stationary Gaussian stochastic process

NASA Technical Reports Server (NTRS)

Carter, M. C.; Madison, M. W.

1973-01-01

The frequency distribution of overshoots in a stationary Gaussian stochastic process is analyzed. The primary processes involved in this analysis are computer simulation and statistical estimation. Computer simulation is used to simulate stationary Gaussian stochastic processes that have selected autocorrelation functions. An analysis of the simulation results reveals a frequency distribution for overshoots with a functional dependence on the mean and variance of the process. Statistical estimation is then used to estimate the mean and variance of a process. It is shown that for an autocorrelation function, the mean and the variance for the number of overshoots, a frequency distribution for overshoots can be estimated.
Robust Magnetotelluric Impedance Estimation

NASA Astrophysics Data System (ADS)

Sutarno, D.

2010-12-01

Robust magnetotelluric (MT) response function estimators are now in standard use by the induction community. Properly devised and applied, these have ability to reduce the influence of unusual data (outliers). The estimators always yield impedance estimates which are better than the conventional least square (LS) estimation because the `real' MT data almost never satisfy the statistical assumptions of Gaussian distribution and stationary upon which normal spectral analysis is based. This paper discuses the development and application of robust estimation procedures which can be classified as M-estimators to MT data. Starting with the description of the estimators, special attention is addressed to the recent development of a bounded-influence robust estimation, including utilization of the Hilbert Transform (HT) operation on causal MT impedance functions. The resulting robust performances are illustrated using synthetic as well as real MT data.
Detection of rheumatoid arthritis by evaluation of normalized variances of fluorescence time correlation functions

NASA Astrophysics Data System (ADS)

Dziekan, Thomas; Weissbach, Carmen; Voigt, Jan; Ebert, Bernd; MacDonald, Rainer; Bahner, Malte L.; Mahler, Marianne; Schirner, Michael; Berliner, Michael; Berliner, Birgitt; Osel, Jens; Osel, Ilka

2011-07-01

Fluorescence imaging using the dye indocyanine green as a contrast agent was investigated in a prospective clinical study for the detection of rheumatoid arthritis. Normalized variances of correlated time series of fluorescence intensities describing the bolus kinetics of the contrast agent in certain regions of interest were analyzed to differentiate healthy from inflamed finger joints. These values are determined using a robust, parameter-free algorithm. We found that the normalized variance of correlation functions improves the differentiation between healthy joints of volunteers and joints with rheumatoid arthritis of patients by about 10% compared to, e.g., ratios of areas under the curves of raw data.

Robust optimization of supersonic ORC nozzle guide vanes

NASA Astrophysics Data System (ADS)

Bufi, Elio A.; Cinnella, Paola

2017-03-01

An efficient Robust Optimization (RO) strategy is developed for the design of 2D supersonic Organic Rankine Cycle turbine expanders. The dense gas effects are not-negligible for this application and they are taken into account describing the thermodynamics by means of the Peng-Robinson-Stryjek-Vera equation of state. The design methodology combines an Uncertainty Quantification (UQ) loop based on a Bayesian kriging model of the system response to the uncertain parameters, used to approximate statistics (mean and variance) of the uncertain system output, a CFD solver, and a multi-objective non-dominated sorting algorithm (NSGA), also based on a Kriging surrogate of the multi-objective fitness function, along with an adaptive infill strategy for surrogate enrichment at each generation of the NSGA. The objective functions are the average and variance of the isentropic efficiency. The blade shape is parametrized by means of a Free Form Deformation (FFD) approach. The robust optimal blades are compared to the baseline design (based on the Method of Characteristics) and to a blade obtained by means of a deterministic CFD-based optimization.
On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models.

PubMed

Magis, David

2014-11-01

In item response theory, the classical estimators of ability are highly sensitive to response disturbances and can return strongly biased estimates of the true underlying ability level. Robust methods were introduced to lessen the impact of such aberrant responses on the estimation process. The computation of asymptotic (i.e., large-sample) standard errors (ASE) for these robust estimators, however, has not yet been fully considered. This paper focuses on a broad class of robust ability estimators, defined by an appropriate selection of the weight function and the residual measure, for which the ASE is derived from the theory of estimating equations. The maximum likelihood (ML) and the robust estimators, together with their estimated ASEs, are then compared in a simulation study by generating random guessing disturbances. It is concluded that both the estimators and their ASE perform similarly in the absence of random guessing, while the robust estimator and its estimated ASE are less biased and outperform their ML counterparts in the presence of random guessing with large impact on the item response process. © 2013 The British Psychological Society.
Extensions of output variance constrained controllers to hard constraints

NASA Technical Reports Server (NTRS)

Skelton, R.; Zhu, G.

1989-01-01

Covariance Controllers assign specified matrix values to the state covariance. A number of robustness results are directly related to the covariance matrix. The conservatism in known upperbounds on the H infinity, L infinity, and L (sub 2) norms for stability and disturbance robustness of linear uncertain systems using covariance controllers is illustrated with examples. These results are illustrated for continuous and discrete time systems. **** ONLY 2 BLOCK MARKERS FOUND -- RETRY *****
Robust Methods for Moderation Analysis with a Two-Level Regression Model.

PubMed

Yang, Miao; Yuan, Ke-Hai

2016-01-01

Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
Estimating unconsolidated sediment cover thickness by using the horizontal distance to a bedrock outcrop as secondary information

NASA Astrophysics Data System (ADS)

Kitterød, Nils-Otto

2017-08-01

Unconsolidated sediment cover thickness (D) above bedrock was estimated by using a publicly available well database from Norway, GRANADA. General challenges associated with such databases typically involve clustering and bias. However, if information about the horizontal distance to the nearest bedrock outcrop (L) is included, does the spatial estimation of D improve? This idea was tested by comparing two cross-validation results: ordinary kriging (OK) where L was disregarded; and co-kriging (CK) where cross-covariance between D and L was included. The analysis showed only minor differences between OK and CK with respect to differences between estimation and true values. However, the CK results gave in general less estimation variance compared to the OK results. All observations were declustered and transformed to standard normal probability density functions before estimation and back-transformed for the cross-validation analysis. The semivariogram analysis gave correlation lengths for D and L of approx. 10 and 6 km. These correlations reduce the estimation variance in the cross-validation analysis because more than 50 % of the data material had two or more observations within a radius of 5 km. The small-scale variance of D, however, was about 50 % of the total variance, which gave an accuracy of less than 60 % for most of the cross-validation cases. Despite the noisy character of the observations, the analysis demonstrated that L can be used as secondary information to reduce the estimation variance of D.
Genetic control of residual variance of yearling weight in Nellore beef cattle.

PubMed

Iung, L H S; Neves, H H R; Mulder, H A; Carvalheiro, R

2017-04-01

There is evidence for genetic variability in residual variance of livestock traits, which offers the potential for selection for increased uniformity of production. Different statistical approaches have been employed to study this topic; however, little is known about the concordance between them. The aim of our study was to investigate the genetic heterogeneity of residual variance on yearling weight (YW; 291.15 ± 46.67) in a Nellore beef cattle population; to compare the results of the statistical approaches, the two-step approach and the double hierarchical generalized linear model (DHGLM); and to evaluate the effectiveness of power transformation to accommodate scale differences. The comparison was based on genetic parameters, accuracy of EBV for residual variance, and cross-validation to assess predictive performance of both approaches. A total of 194,628 yearling weight records from 625 sires were used in the analysis. The results supported the hypothesis of genetic heterogeneity of residual variance on YW in Nellore beef cattle and the opportunity of selection, measured through the genetic coefficient of variation of residual variance (0.10 to 0.12 for the two-step approach and 0.17 for DHGLM, using an untransformed data set). However, low estimates of genetic variance associated with positive genetic correlations between mean and residual variance (about 0.20 for two-step and 0.76 for DHGLM for an untransformed data set) limit the genetic response to selection for uniformity of production while simultaneously increasing YW itself. Moreover, large sire families are needed to obtain accurate estimates of genetic merit for residual variance, as indicated by the low heritability estimates (<0.007). Box-Cox transformation was able to decrease the dependence of the variance on the mean and decreased the estimates of genetic parameters for residual variance. The transformation reduced but did not eliminate all the genetic heterogeneity of residual variance, highlighting its presence beyond the scale effect. The DHGLM showed higher predictive ability of EBV for residual variance and therefore should be preferred over the two-step approach.
Robust Least-Squares Support Vector Machine With Minimization of Mean and Variance of Modeling Error.

PubMed

Lu, Xinjiang; Liu, Wenbo; Zhou, Chuang; Huang, Minghui

2017-06-13

The least-squares support vector machine (LS-SVM) is a popular data-driven modeling method and has been successfully applied to a wide range of applications. However, it has some disadvantages, including being ineffective at handling non-Gaussian noise as well as being sensitive to outliers. In this paper, a robust LS-SVM method is proposed and is shown to have more reliable performance when modeling a nonlinear system under conditions where Gaussian or non-Gaussian noise is present. The construction of a new objective function allows for a reduction of the mean of the modeling error as well as the minimization of its variance, and it does not constrain the mean of the modeling error to zero. This differs from the traditional LS-SVM, which uses a worst-case scenario approach in order to minimize the modeling error and constrains the mean of the modeling error to zero. In doing so, the proposed method takes the modeling error distribution information into consideration and is thus less conservative and more robust in regards to random noise. A solving method is then developed in order to determine the optimal parameters for the proposed robust LS-SVM. An additional analysis indicates that the proposed LS-SVM gives a smaller weight to a large-error training sample and a larger weight to a small-error training sample, and is thus more robust than the traditional LS-SVM. The effectiveness of the proposed robust LS-SVM is demonstrated using both artificial and real life cases.
Robust radio interferometric calibration using the t-distribution

NASA Astrophysics Data System (ADS)

Kazemi, S.; Yatawatta, S.

2013-10-01

A major stage of radio interferometric data processing is calibration or the estimation of systematic errors in the data and the correction for such errors. A stochastic error (noise) model is assumed, and in most cases, this underlying model is assumed to be Gaussian. However, outliers in the data due to interference or due to errors in the sky model would have adverse effects on processing based on a Gaussian noise model. Most of the shortcomings of calibration such as the loss in flux or coherence, and the appearance of spurious sources, could be attributed to the deviations of the underlying noise model. In this paper, we propose to improve the robustness of calibration by using a noise model based on Student's t-distribution. Student's t-noise is a special case of Gaussian noise when the variance is unknown. Unlike Gaussian-noise-model-based calibration, traditional least-squares minimization would not directly extend to a case when we have a Student's t-noise model. Therefore, we use a variant of the expectation-maximization algorithm, called the expectation-conditional maximization either algorithm, when we have a Student's t-noise model and use the Levenberg-Marquardt algorithm in the maximization step. We give simulation results to show the robustness of the proposed calibration method as opposed to traditional Gaussian-noise-model-based calibration, especially in preserving the flux of weaker sources that are not included in the calibration model.
Woodcock-Johnson-III, Kaufman Adolescent and Adult Intelligence Test (KAIT), Kaufman Assessment Battery for Children (KABC), and Differential Ability Scales (DAS) support Carroll but not Cattell-Horn.

PubMed

Cucina, Jeffrey M; Howardson, Garett N

2017-08-01

Recently emerging evidence suggests that the dominant structural model of mental abilities-the Cattell-Horn-Carroll (CHC) model-may not adequately account for observed scores for mental abilities batteries, leading scholars to call into question the model's validity. Establishing the robustness of these findings is important since CHC is the foundation for several contemporary mental abilities test batteries, such as the Woodcock-Johnson III (WJ-III). Using confirmatory factor analysis, we investigated CHC's robustness across 4 archival samples of mental abilities test battery data, including the WJ-III, the Kaufman Adolescent & Adult Intelligence Test (KAIT), the Kaufman Assessment Battery for Children (KABC), and the Differential Ability Scales (DAS). We computed omega hierarchical (ωH) and omega subscale (ωS) coefficients for g and the broad factors, which estimated the relationship of composite scores to g and the broad factors, respectively. Across all 4 samples, we found strong evidence for a general ability, g. We additionally found evidence for 3 to 9 residualized, orthogonal broad abilities existing independently of g, many of which also explained reliable variance in test battery scores that cannot be accounted for by g alone. The reliabilities of these broad factors, however, were less than desirable (i.e., <.80) and achieving desirable reliabilities would be practically infeasible (e.g., requiring excessively large numbers of subtests). Our results, and those of CHC critics, are wholly consistent with Carroll's model. Essentially, both g and orthogonal broad abilities are required to explain variance in mental abilities test battery scores, which is consistent with Carroll but not Cattell-Horn. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Optimal distribution of integration time for intensity measurements in Stokes polarimetry.

PubMed

Li, Xiaobo; Liu, Tiegen; Huang, Bingjing; Song, Zhanjie; Hu, Haofeng

2015-10-19

We consider the typical Stokes polarimetry system, which performs four intensity measurements to estimate a Stokes vector. We show that if the total integration time of intensity measurements is fixed, the variance of the Stokes vector estimator depends on the distribution of the integration time at four intensity measurements. Therefore, by optimizing the distribution of integration time, the variance of the Stokes vector estimator can be decreased. In this paper, we obtain the closed-form solution of the optimal distribution of integration time by employing Lagrange multiplier method. According to the theoretical analysis and real-world experiment, it is shown that the total variance of the Stokes vector estimator can be significantly decreased about 40% in the case discussed in this paper. The method proposed in this paper can effectively decrease the measurement variance and thus statistically improves the measurement accuracy of the polarimetric system.
Optimal distribution of integration time for intensity measurements in degree of linear polarization polarimetry.

PubMed

Li, Xiaobo; Hu, Haofeng; Liu, Tiegen; Huang, Bingjing; Song, Zhanjie

2016-04-04

We consider the degree of linear polarization (DOLP) polarimetry system, which performs two intensity measurements at orthogonal polarization states to estimate DOLP. We show that if the total integration time of intensity measurements is fixed, the variance of the DOLP estimator depends on the distribution of integration time for two intensity measurements. Therefore, by optimizing the distribution of integration time, the variance of the DOLP estimator can be decreased. In this paper, we obtain the closed-form solution of the optimal distribution of integration time in an approximate way by employing Delta method and Lagrange multiplier method. According to the theoretical analyses and real-world experiments, it is shown that the variance of the DOLP estimator can be decreased for any value of DOLP. The method proposed in this paper can effectively decrease the measurement variance and thus statistically improve the measurement accuracy of the polarimetry system.
A comparison of maximum likelihood and other estimators of eigenvalues from several correlated Monte Carlo samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beer, M.

1980-12-01

The maximum likelihood method for the multivariate normal distribution is applied to the case of several individual eigenvalues. Correlated Monte Carlo estimates of the eigenvalue are assumed to follow this prescription and aspects of the assumption are examined. Monte Carlo cell calculations using the SAM-CE and VIM codes for the TRX-1 and TRX-2 benchmark reactors, and SAM-CE full core results are analyzed with this method. Variance reductions of a few percent to a factor of 2 are obtained from maximum likelihood estimation as compared with the simple average and the minimum variance individual eigenvalue. The numerical results verify that themore » use of sample variances and correlation coefficients in place of the corresponding population statistics still leads to nearly minimum variance estimation for a sufficient number of histories and aggregates.« less
Estimation variance bounds of importance sampling simulations in digital communication systems

NASA Technical Reports Server (NTRS)

Lu, D.; Yao, K.

1991-01-01

In practical applications of importance sampling (IS) simulation, two basic problems are encountered, that of determining the estimation variance and that of evaluating the proper IS parameters needed in the simulations. The authors derive new upper and lower bounds on the estimation variance which are applicable to IS techniques. The upper bound is simple to evaluate and may be minimized by the proper selection of the IS parameter. Thus, lower and upper bounds on the improvement ratio of various IS techniques relative to the direct Monte Carlo simulation are also available. These bounds are shown to be useful and computationally simple to obtain. Based on the proposed technique, one can readily find practical suboptimum IS parameters. Numerical results indicate that these bounding techniques are useful for IS simulations of linear and nonlinear communication systems with intersymbol interference in which bit error rate and IS estimation variances cannot be obtained readily using prior techniques.
Calibrating SALT: a sampling scheme to improve estimates of suspended sediment yield

Treesearch

Robert B. Thomas

1986-01-01

Abstract - SALT (Selection At List Time) is a variable probability sampling scheme that provides unbiased estimates of suspended sediment yield and its variance. SALT performs better than standard schemes which are estimate variance. Sampling probabilities are based on a sediment rating function which promotes greater sampling intensity during periods of high...
Evaluation of three lidar scanning strategies for turbulence measurements

DOE PAGES

Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia; ...

2016-05-03

Several errors occur when a traditional Doppler beam swinging (DBS) or velocity–azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused bymore » VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.« less
Evaluation of three lidar scanning strategies for turbulence measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia

Several errors occur when a traditional Doppler beam swinging (DBS) or velocity–azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused bymore » VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.« less
A two step Bayesian approach for genomic prediction of breeding values.

PubMed

Shariati, Mohammad M; Sørensen, Peter; Janss, Luc

2012-05-21

In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
Monthly hydroclimatology of the continental United States

NASA Astrophysics Data System (ADS)

Petersen, Thomas; Devineni, Naresh; Sankarasubramanian, A.

2018-04-01

Physical/semi-empirical models that do not require any calibration are of paramount need for estimating hydrological fluxes for ungauged sites. We develop semi-empirical models for estimating the mean and variance of the monthly streamflow based on Taylor Series approximation of a lumped physically based water balance model. The proposed models require mean and variance of monthly precipitation and potential evapotranspiration, co-variability of precipitation and potential evapotranspiration and regionally calibrated catchment retention sensitivity, atmospheric moisture uptake sensitivity, groundwater-partitioning factor, and the maximum soil moisture holding capacity parameters. Estimates of mean and variance of monthly streamflow using the semi-empirical equations are compared with the observed estimates for 1373 catchments in the continental United States. Analyses show that the proposed models explain the spatial variability in monthly moments for basins in lower elevations. A regionalization of parameters for each water resources region show good agreement between observed moments and model estimated moments during January, February, March and April for mean and all months except May and June for variance. Thus, the proposed relationships could be employed for understanding and estimating the monthly hydroclimatology of ungauged basins using regional parameters.
Detection of gene-environment interaction in pedigree data using genome-wide genotypes.

PubMed

Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V

2016-12-01

Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits.
Turbulence Variance Characteristics in the Unstable Atmospheric Boundary Layer above Flat Pine Forest

NASA Astrophysics Data System (ADS)

Asanuma, Jun

Variances of the velocity components and scalars are important as indicators of the turbulence intensity. They also can be utilized to estimate surface fluxes in several types of "variance methods", and the estimated fluxes can be regional values if the variances from which they are calculated are regionally representative measurements. On these motivations, variances measured by an aircraft in the unstable ABL over a flat pine forest during HAPEX-Mobilhy were analyzed within the context of the similarity scaling arguments. The variances of temperature and vertical velocity within the atmospheric surface layer were found to follow closely the Monin-Obukhov similarity theory, and to yield reasonable estimates of the surface sensible heat fluxes when they are used in variance methods. This gives a validation to the variance methods with aircraft measurements. On the other hand, the specific humidity variances were influenced by the surface heterogeneity and clearly fail to obey MOS. A simple analysis based on the similarity law for free convection produced a comprehensible and quantitative picture regarding the effect of the surface flux heterogeneity on the statistical moments, and revealed that variances of the active and passive scalars become dissimilar because of their different roles in turbulence. The analysis also indicated that the mean quantities are also affected by the heterogeneity but to a less extent than the variances. The temperature variances in the mixed layer (ML) were examined by using a generalized top-down bottom-up diffusion model with some combinations of velocity scales and inversion flux models. The results showed that the surface shear stress exerts considerable influence on the lower ML. Also with the temperature and vertical velocity variances ML variance methods were tested, and their feasibility was investigated. Finally, the variances in the ML were analyzed in terms of the local similarity concept; the results confirmed the original hypothesis by Panofsky and McCormick that the local scaling in terms of the local buoyancy flux defines the lower bound of the moments.

Genomic estimation of additive and dominance effects and impact of accounting for dominance on accuracy of genomic evaluation in sheep populations.

PubMed

Moghaddar, N; van der Werf, J H J

2017-12-01

The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.
On the multiple imputation variance estimator for control-based and delta-adjusted pattern mixture models.

PubMed

Tang, Yongqiang

2017-12-01

Control-based pattern mixture models (PMM) and delta-adjusted PMMs are commonly used as sensitivity analyses in clinical trials with non-ignorable dropout. These PMMs assume that the statistical behavior of outcomes varies by pattern in the experimental arm in the imputation procedure, but the imputed data are typically analyzed by a standard method such as the primary analysis model. In the multiple imputation (MI) inference, Rubin's variance estimator is generally biased when the imputation and analysis models are uncongenial. One objective of the article is to quantify the bias of Rubin's variance estimator in the control-based and delta-adjusted PMMs for longitudinal continuous outcomes. These PMMs assume the same observed data distribution as the mixed effects model for repeated measures (MMRM). We derive analytic expressions for the MI treatment effect estimator and the associated Rubin's variance in these PMMs and MMRM as functions of the maximum likelihood estimator from the MMRM analysis and the observed proportion of subjects in each dropout pattern when the number of imputations is infinite. The asymptotic bias is generally small or negligible in the delta-adjusted PMM, but can be sizable in the control-based PMM. This indicates that the inference based on Rubin's rule is approximately valid in the delta-adjusted PMM. A simple variance estimator is proposed to ensure asymptotically valid MI inferences in these PMMs, and compared with the bootstrap variance. The proposed method is illustrated by the analysis of an antidepressant trial, and its performance is further evaluated via a simulation study. © 2017, The International Biometric Society.
Advances in the meta-analysis of heterogeneous clinical trials II: The quality effects model.

PubMed

Doi, Suhail A R; Barendregt, Jan J; Khan, Shahjahan; Thalib, Lukman; Williams, Gail M

2015-11-01

This article examines the performance of the updated quality effects (QE) estimator for meta-analysis of heterogeneous studies. It is shown that this approach leads to a decreased mean squared error (MSE) of the estimator while maintaining the nominal level of coverage probability of the confidence interval. Extensive simulation studies confirm that this approach leads to the maintenance of the correct coverage probability of the confidence interval, regardless of the level of heterogeneity, as well as a lower observed variance compared to the random effects (RE) model. The QE model is robust to subjectivity in quality assessment down to completely random entry, in which case its MSE equals that of the RE estimator. When the proposed QE method is applied to a meta-analysis of magnesium for myocardial infarction data, the pooled mortality odds ratio (OR) becomes 0.81 (95% CI 0.61-1.08) which favors the larger studies but also reflects the increased uncertainty around the pooled estimate. In comparison, under the RE model, the pooled mortality OR is 0.71 (95% CI 0.57-0.89) which is less conservative than that of the QE results. The new estimation method has been implemented into the free meta-analysis software MetaXL which allows comparison of alternative estimators and can be downloaded from www.epigear.com. Copyright © 2015 Elsevier Inc. All rights reserved.
Sampling hazelnuts for aflatoxin: uncertainty associated with sampling, sample preparation, and analysis.

PubMed

Ozay, Guner; Seyhan, Ferda; Yilmaz, Aysun; Whitaker, Thomas B; Slate, Andrew B; Giesbrecht, Francis

2006-01-01

The variability associated with the aflatoxin test procedure used to estimate aflatoxin levels in bulk shipments of hazelnuts was investigated. Sixteen 10 kg samples of shelled hazelnuts were taken from each of 20 lots that were suspected of aflatoxin contamination. The total variance associated with testing shelled hazelnuts was estimated and partitioned into sampling, sample preparation, and analytical variance components. Each variance component increased as aflatoxin concentration (either B1 or total) increased. With the use of regression analysis, mathematical expressions were developed to model the relationship between aflatoxin concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific aflatoxin concentration. The sampling, sample preparation, and analytical variances associated with estimating aflatoxin in a hazelnut lot at a total aflatoxin level of 10 ng/g and using a 10 kg sample, a 50 g subsample, dry comminution with a Robot Coupe mill, and a high-performance liquid chromatographic analytical method are 174.40, 0.74, and 0.27, respectively. The sampling, sample preparation, and analytical steps of the aflatoxin test procedure accounted for 99.4, 0.4, and 0.2% of the total variability, respectively.
Uncertainty importance analysis using parametric moment ratio functions.

PubMed

Wei, Pengfei; Lu, Zhenzhou; Song, Jingwen

2014-02-01

This article presents a new importance analysis framework, called parametric moment ratio function, for measuring the reduction of model output uncertainty when the distribution parameters of inputs are changed, and the emphasis is put on the mean and variance ratio functions with respect to the variances of model inputs. The proposed concepts efficiently guide the analyst to achieve a targeted reduction on the model output mean and variance by operating on the variances of model inputs. The unbiased and progressive unbiased Monte Carlo estimators are also derived for the parametric mean and variance ratio functions, respectively. Only a set of samples is needed for implementing the proposed importance analysis by the proposed estimators, thus the computational cost is free of input dimensionality. An analytical test example with highly nonlinear behavior is introduced for illustrating the engineering significance of the proposed importance analysis technique and verifying the efficiency and convergence of the derived Monte Carlo estimators. Finally, the moment ratio function is applied to a planar 10-bar structure for achieving a targeted 50% reduction of the model output variance. © 2013 Society for Risk Analysis.
A Multilevel AR(1) Model: Allowing for Inter-Individual Differences in Trait-Scores, Inertia, and Innovation Variance.

PubMed

Jongerling, Joran; Laurenceau, Jean-Philippe; Hamaker, Ellen L

2015-01-01

In this article we consider a multilevel first-order autoregressive [AR(1)] model with random intercepts, random autoregression, and random innovation variance (i.e., the level 1 residual variance). Including random innovation variance is an important extension of the multilevel AR(1) model for two reasons. First, between-person differences in innovation variance are important from a substantive point of view, in that they capture differences in sensitivity and/or exposure to unmeasured internal and external factors that influence the process. Second, using simulation methods we show that modeling the innovation variance as fixed across individuals, when it should be modeled as a random effect, leads to biased parameter estimates. Additionally, we use simulation methods to compare maximum likelihood estimation to Bayesian estimation of the multilevel AR(1) model and investigate the trade-off between the number of individuals and the number of time points. We provide an empirical illustration by applying the extended multilevel AR(1) model to daily positive affect ratings from 89 married women over the course of 42 consecutive days.
Impact of clinical input variable uncertainties on ten-year atherosclerotic cardiovascular disease risk using new pooled cohort equations.

PubMed

Gupta, Himanshu; Schiros, Chun G; Sharifov, Oleg F; Jain, Apurva; Denney, Thomas S

2016-08-31

Recently released American College of Cardiology/American Heart Association (ACC/AHA) guideline recommends the Pooled Cohort equations for evaluating atherosclerotic cardiovascular risk of individuals. The impact of the clinical input variable uncertainties on the estimates of ten-year cardiovascular risk based on ACC/AHA guidelines is not known. Using a publicly available the National Health and Nutrition Examination Survey dataset (2005-2010), we computed maximum and minimum ten-year cardiovascular risks by assuming clinically relevant variations/uncertainties in input of age (0-1 year) and ±10 % variation in total-cholesterol, high density lipoprotein- cholesterol, and systolic blood pressure and by assuming uniform distribution of the variance of each variable. We analyzed the changes in risk category compared to the actual inputs at 5 % and 7.5 % risk limits as these limits define the thresholds for consideration of drug therapy in the new guidelines. The new-pooled cohort equations for risk estimation were implemented in a custom software package. Based on our input variances, changes in risk category were possible in up to 24 % of the population cohort at both 5 % and 7.5 % risk boundary limits. This trend was consistently noted across all subgroups except in African American males where most of the cohort had ≥7.5 % baseline risk regardless of the variation in the variables. The uncertainties in the input variables can alter the risk categorization. The impact of these variances on the ten-year risk needs to be incorporated into the patient/clinician discussion and clinical decision making. Incorporating good clinical practices for the measurement of critical clinical variables and robust standardization of laboratory parameters to more stringent reference standards is extremely important for successful implementation of the new guidelines. Furthermore, ability to customize the risk calculator inputs to better represent unique clinical circumstances specific to individual needs would be highly desirable in the future versions of the risk calculator.
Robust estimation for partially linear models with large-dimensional covariates

PubMed Central

Zhu, LiPing; Li, RunZe; Cui, HengJian

2014-01-01

We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087
Robust estimation for partially linear models with large-dimensional covariates.

PubMed

Zhu, LiPing; Li, RunZe; Cui, HengJian

2013-10-01

We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
A bivariate contaminated binormal model for robust fitting of proper ROC curves to a pair of correlated, possibly degenerate, ROC datasets.

PubMed

Zhai, Xuetong; Chakraborty, Dev P

2017-06-01

The objective was to design and implement a bivariate extension to the contaminated binormal model (CBM) to fit paired receiver operating characteristic (ROC) datasets-possibly degenerate-with proper ROC curves. Paired datasets yield two correlated ratings per case. Degenerate datasets have no interior operating points and proper ROC curves do not inappropriately cross the chance diagonal. The existing method, developed more than three decades ago utilizes a bivariate extension to the binormal model, implemented in CORROC2 software, which yields improper ROC curves and cannot fit degenerate datasets. CBM can fit proper ROC curves to unpaired (i.e., yielding one rating per case) and degenerate datasets, and there is a clear scientific need to extend it to handle paired datasets. In CBM, nondiseased cases are modeled by a probability density function (pdf) consisting of a unit variance peak centered at zero. Diseased cases are modeled with a mixture distribution whose pdf consists of two unit variance peaks, one centered at positive μ with integrated probability α, the mixing fraction parameter, corresponding to the fraction of diseased cases where the disease was visible to the radiologist, and one centered at zero, with integrated probability (1-α), corresponding to disease that was not visible. It is shown that: (a) for nondiseased cases the bivariate extension is a unit variances bivariate normal distribution centered at (0,0) with a specified correlation ρ 1 ; (b) for diseased cases the bivariate extension is a mixture distribution with four peaks, corresponding to disease not visible in either condition, disease visible in only one condition, contributing two peaks, and disease visible in both conditions. An expression for the likelihood function is derived. A maximum likelihood estimation (MLE) algorithm, CORCBM, was implemented in the R programming language that yields parameter estimates and the covariance matrix of the parameters, and other statistics. A limited simulation validation of the method was performed. CORCBM and CORROC2 were applied to two datasets containing nine readers each contributing paired interpretations. CORCBM successfully fitted the data for all readers, whereas CORROC2 failed to fit a degenerate dataset. All fits were visually reasonable. All CORCBM fits were proper, whereas all CORROC2 fits were improper. CORCBM and CORROC2 were in agreement (a) in declaring only one of the nine readers as having significantly different performances in the two modalities; (b) in estimating higher correlations for diseased cases than for nondiseased ones; and (c) in finding that the intermodality correlation estimates for nondiseased cases were consistent between the two methods. All CORCBM fits yielded higher area under curve (AUC) than the CORROC2 fits, consistent with the fact that a proper ROC model like CORCBM is based on a likelihood-ratio-equivalent decision variable, and consequently yields higher performance than the binormal model-based CORROC2. The method gave satisfactory fits to four simulated datasets. CORCBM is a robust method for fitting paired ROC datasets, always yielding proper ROC curves, and able to fit degenerate datasets. © 2017 American Association of Physicists in Medicine.
An adaptive displacement estimation algorithm for improved reconstruction of thermal strain.

PubMed

Ding, Xuan; Dutta, Debaditya; Mahmoud, Ahmed M; Tillman, Bryan; Leers, Steven A; Kim, Kang

2015-01-01

Thermal strain imaging (TSI) can be used to differentiate between lipid and water-based tissues in atherosclerotic arteries. However, detecting small lipid pools in vivo requires accurate and robust displacement estimation over a wide range of displacement magnitudes. Phase-shift estimators such as Loupas' estimator and time-shift estimators such as normalized cross-correlation (NXcorr) are commonly used to track tissue displacements. However, Loupas' estimator is limited by phase-wrapping and NXcorr performs poorly when the SNR is low. In this paper, we present an adaptive displacement estimation algorithm that combines both Loupas' estimator and NXcorr. We evaluated this algorithm using computer simulations and an ex vivo human tissue sample. Using 1-D simulation studies, we showed that when the displacement magnitude induced by thermal strain was >λ/8 and the electronic system SNR was >25.5 dB, the NXcorr displacement estimate was less biased than the estimate found using Loupas' estimator. On the other hand, when the displacement magnitude was ≤λ/4 and the electronic system SNR was ≤25.5 dB, Loupas' estimator had less variance than NXcorr. We used these findings to design an adaptive displacement estimation algorithm. Computer simulations of TSI showed that the adaptive displacement estimator was less biased than either Loupas' estimator or NXcorr. Strain reconstructed from the adaptive displacement estimates improved the strain SNR by 43.7 to 350% and the spatial accuracy by 1.2 to 23.0% (P < 0.001). An ex vivo human tissue study provided results that were comparable to computer simulations. The results of this study showed that a novel displacement estimation algorithm, which combines two different displacement estimators, yielded improved displacement estimation and resulted in improved strain reconstruction.
An Adaptive Displacement Estimation Algorithm for Improved Reconstruction of Thermal Strain

PubMed Central

Ding, Xuan; Dutta, Debaditya; Mahmoud, Ahmed M.; Tillman, Bryan; Leers, Steven A.; Kim, Kang

2014-01-01

Thermal strain imaging (TSI) can be used to differentiate between lipid and water-based tissues in atherosclerotic arteries. However, detecting small lipid pools in vivo requires accurate and robust displacement estimation over a wide range of displacement magnitudes. Phase-shift estimators such as Loupas’ estimator and time-shift estimators like normalized cross-correlation (NXcorr) are commonly used to track tissue displacements. However, Loupas’ estimator is limited by phase-wrapping and NXcorr performs poorly when the signal-to-noise ratio (SNR) is low. In this paper, we present an adaptive displacement estimation algorithm that combines both Loupas’ estimator and NXcorr. We evaluated this algorithm using computer simulations and an ex-vivo human tissue sample. Using 1-D simulation studies, we showed that when the displacement magnitude induced by thermal strain was >λ/8 and the electronic system SNR was >25.5 dB, the NXcorr displacement estimate was less biased than the estimate found using Loupas’ estimator. On the other hand, when the displacement magnitude was ≤λ/4 and the electronic system SNR was ≤25.5 dB, Loupas’ estimator had less variance than NXcorr. We used these findings to design an adaptive displacement estimation algorithm. Computer simulations of TSI using Field II showed that the adaptive displacement estimator was less biased than either Loupas’ estimator or NXcorr. Strain reconstructed from the adaptive displacement estimates improved the strain SNR by 43.7–350% and the spatial accuracy by 1.2–23.0% (p < 0.001). An ex-vivo human tissue study provided results that were comparable to computer simulations. The results of this study showed that a novel displacement estimation algorithm, which combines two different displacement estimators, yielded improved displacement estimation and results in improved strain reconstruction. PMID:25585398
Robust time and frequency domain estimation methods in adaptive control

NASA Technical Reports Server (NTRS)

Lamaire, Richard Orville

1987-01-01

A robust identification method was developed for use in an adaptive control system. The type of estimator is called the robust estimator, since it is robust to the effects of both unmodeled dynamics and an unmeasurable disturbance. The development of the robust estimator was motivated by a need to provide guarantees in the identification part of an adaptive controller. To enable the design of a robust control system, a nominal model as well as a frequency-domain bounding function on the modeling uncertainty associated with this nominal model must be provided. Two estimation methods are presented for finding parameter estimates, and, hence, a nominal model. One of these methods is based on the well developed field of time-domain parameter estimation. In a second method of finding parameter estimates, a type of weighted least-squares fitting to a frequency-domain estimated model is used. The frequency-domain estimator is shown to perform better, in general, than the time-domain parameter estimator. In addition, a methodology for finding a frequency-domain bounding function on the disturbance is used to compute a frequency-domain bounding function on the additive modeling error due to the effects of the disturbance and the use of finite-length data. The performance of the robust estimator in both open-loop and closed-loop situations is examined through the use of simulations.
Asymptotic Effect of Misspecification in the Random Part of the Multilevel Model

ERIC Educational Resources Information Center

Berkhof, Johannes; Kampen, Jarl Kennard

2004-01-01

The authors examine the asymptotic effect of omitting a random coefficient in the multilevel model and derive expressions for the change in (a) the variance components estimator and (b) the estimated variance of the fixed effects estimator. They apply the method of moments, which yields a closed form expression for the omission effect. In…
Sampling in freshwater environments: suspended particle traps and variability in the final data.

PubMed

Barbizzi, Sabrina; Pati, Alessandra

2008-11-01

This paper reports one practical method to estimate the measurement uncertainty including sampling, derived by the approach implemented by Ramsey for soil investigations. The methodology has been applied to estimate the measurements uncertainty (sampling and analyses) of (137)Cs activity concentration (Bq kg(-1)) and total carbon content (%) in suspended particle sampling in a freshwater ecosystem. Uncertainty estimates for between locations, sampling and analysis components have been evaluated. For the considered measurands, the relative expanded measurement uncertainties are 12.3% for (137)Cs and 4.5% for total carbon. For (137)Cs, the measurement (sampling+analysis) variance gives the major contribution to the total variance, while for total carbon the spatial variance is the dominant contributor to the total variance. The limitations and advantages of this basic method are discussed.
Systems Engineering Programmatic Estimation Using Technology Variance

NASA Technical Reports Server (NTRS)

Mog, Robert A.

2000-01-01

Unique and innovative system programmatic estimation is conducted using the variance of the packaged technologies. Covariance analysis is performed on the subsystems and components comprising the system of interest. Technological "return" and "variation" parameters are estimated. These parameters are combined with the model error to arrive at a measure of system development stability. The resulting estimates provide valuable information concerning the potential cost growth of the system under development.
Heat and solute tracers: how do they compare in heterogeneous aquifers?

PubMed

Irvine, Dylan J; Simmons, Craig T; Werner, Adrian D; Graf, Thomas

2015-04-01

A comparison of groundwater velocity in heterogeneous aquifers estimated from hydraulic methods, heat and solute tracers was made using numerical simulations. Aquifer heterogeneity was described by geostatistical properties of the Borden, Cape Cod, North Bay, and MADE aquifers. Both heat and solute tracers displayed little systematic under- or over-estimation in velocity relative to a hydraulic control. The worst cases were under-estimates of 6.63% for solute and 2.13% for the heat tracer. Both under- and over-estimation of velocity from the heat tracer relative to the solute tracer occurred. Differences between the estimates from the tracer methods increased as the mean velocity decreased, owing to differences in rates of molecular diffusion and thermal conduction. The variance in estimated velocity using all methods increased as the variance in log-hydraulic conductivity (K) and correlation length scales increased. The variance in velocity for each scenario was remarkably small when compared to σ2 ln(K) for all methods tested. The largest variability identified was for the solute tracer where 95% of velocity estimates ranged by a factor of 19 in simulations where 95% of the K values varied by almost four orders of magnitude. For the same K-fields, this range was a factor of 11 for the heat tracer. The variance in estimated velocity was always lowest when using heat as a tracer. The study results suggest that a solute tracer will provide more understanding about the variance in velocity caused by aquifer heterogeneity and a heat tracer provides a better approximation of the mean velocity. © 2013, National Ground Water Association.
Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application.

PubMed

Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D

2016-06-15

The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Estimation of Additive, Dominance, and Imprinting Genetic Variance Using Genomic Data

PubMed Central

Lopes, Marcos S.; Bastiaansen, John W. M.; Janss, Luc; Knol, Egbert F.; Bovenhuis, Henk

2015-01-01

Traditionally, exploration of genetic variance in humans, plants, and livestock species has been limited mostly to the use of additive effects estimated using pedigree data. However, with the development of dense panels of single-nucleotide polymorphisms (SNPs), the exploration of genetic variation of complex traits is moving from quantifying the resemblance between family members to the dissection of genetic variation at individual loci. With SNPs, we were able to quantify the contribution of additive, dominance, and imprinting variance to the total genetic variance by using a SNP regression method. The method was validated in simulated data and applied to three traits (number of teats, backfat, and lifetime daily gain) in three purebred pig populations. In simulated data, the estimates of additive, dominance, and imprinting variance were very close to the simulated values. In real data, dominance effects account for a substantial proportion of the total genetic variance (up to 44%) for these traits in these populations. The contribution of imprinting to the total phenotypic variance of the evaluated traits was relatively small (1–3%). Our results indicate a strong relationship between additive variance explained per chromosome and chromosome length, which has been described previously for other traits in other species. We also show that a similar linear relationship exists for dominance and imprinting variance. These novel results improve our understanding of the genetic architecture of the evaluated traits and shows promise to apply the SNP regression method to other traits and species, including human diseases. PMID:26438289
Concerns about a variance approach to X-ray diffractometric estimation of microfibril angle in wood

Treesearch

Steve P. Verrill; David E. Kretschmann; Victoria L. Herian; Michael C. Wiemann; Harry A. Alden

2011-01-01

In this article, we raise three technical concerns about Evansâ 1999 Appita Journal âvariance approachâ to estimating microfibril angle (MFA). The first concern is associated with the approximation of the variance of an X-ray intensity half-profile by a function of the MFA and the natural variability of the MFA. The second concern is associated with the approximation...

Concerns about a variance approach to the X-ray diffractometric estimation of microfibril angle in wood

Treesearch

Steve P. Verrill; David E. Kretschmann; Victoria L. Herian; Michael Wiemann; Harry A. Alden

2010-01-01

In this paper we raise three technical concerns about Evansâs 1999 Appita Journal âvariance approachâ to estimating microfibril angle. The first concern is associated with the approximation of the variance of an X-ray intensity half-profile by a function of the microfibril angle and the natural variability of the microfibril angle, S2...
Real-time yield estimation based on deep learning

NASA Astrophysics Data System (ADS)

Rahnemoonfar, Maryam; Sheppard, Clay

2017-05-01

Crop yield estimation is an important task in product management and marketing. Accurate yield prediction helps farmers to make better decision on cultivation practices, plant disease prevention, and the size of harvest labor force. The current practice of yield estimation based on the manual counting of fruits is very time consuming and expensive process and it is not practical for big fields. Robotic systems including Unmanned Aerial Vehicles (UAV) and Unmanned Ground Vehicles (UGV), provide an efficient, cost-effective, flexible, and scalable solution for product management and yield prediction. Recently huge data has been gathered from agricultural field, however efficient analysis of those data is still a challenging task. Computer vision approaches currently face diffident challenges in automatic counting of fruits or flowers including occlusion caused by leaves, branches or other fruits, variance in natural illumination, and scale. In this paper a novel deep convolutional network algorithm was developed to facilitate the accurate yield prediction and automatic counting of fruits and vegetables on the images. Our method is robust to occlusion, shadow, uneven illumination and scale. Experimental results in comparison to the state-of-the art show the effectiveness of our algorithm.
Joint scale-change models for recurrent events and failure time.

PubMed

Xu, Gongjun; Chiou, Sy Han; Huang, Chiung-Yu; Wang, Mei-Cheng; Yan, Jun

2017-01-01

Recurrent event data arise frequently in various fields such as biomedical sciences, public health, engineering, and social sciences. In many instances, the observation of the recurrent event process can be stopped by the occurrence of a correlated failure event, such as treatment failure and death. In this article, we propose a joint scale-change model for the recurrent event process and the failure time, where a shared frailty variable is used to model the association between the two types of outcomes. In contrast to the popular Cox-type joint modeling approaches, the regression parameters in the proposed joint scale-change model have marginal interpretations. The proposed approach is robust in the sense that no parametric assumption is imposed on the distribution of the unobserved frailty and that we do not need the strong Poisson-type assumption for the recurrent event process. We establish consistency and asymptotic normality of the proposed semiparametric estimators under suitable regularity conditions. To estimate the corresponding variances of the estimators, we develop a computationally efficient resampling-based procedure. Simulation studies and an analysis of hospitalization data from the Danish Psychiatric Central Register illustrate the performance of the proposed method.
Doubly robust nonparametric inference on the average treatment effect.

PubMed

Benkeser, D; Carone, M; Laan, M J Van Der; Gilbert, P B

2017-12-01

Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different methods for constructing such estimators and investigate the extent to which they may be modified to also allow doubly robust inference. We find that while targeted minimum loss-based estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need for and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly robust inference in other problems.
Approximate median regression for complex survey data with skewed response.

PubMed

Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi

2016-12-01

The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.
Approximate Median Regression for Complex Survey Data with Skewed Response

PubMed Central

Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi

2016-01-01

Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562
Log-polar mapping-based scale space tracking with adaptive target response

NASA Astrophysics Data System (ADS)

Li, Dongdong; Wen, Gongjian; Kuai, Yangliu; Zhang, Ximing

2017-05-01

Correlation filter-based tracking has exhibited impressive robustness and accuracy in recent years. Standard correlation filter-based trackers are restricted to translation estimation and equipped with fixed target response. These trackers produce an inferior performance when encountered with a significant scale variation or appearance change. We propose a log-polar mapping-based scale space tracker with an adaptive target response. This tracker transforms the scale variation of the target in the Cartesian space into a shift along the logarithmic axis in the log-polar space. A one-dimensional scale correlation filter is learned online to estimate the shift along the logarithmic axis. With the log-polar representation, scale estimation is achieved accurately without a multiresolution pyramid. To achieve an adaptive target response, a variance of the Gaussian function is computed from the response map and updated online with a learning rate parameter. Our log-polar mapping-based scale correlation filter and adaptive target response can be combined with any correlation filter-based trackers. In addition, the scale correlation filter can be extended to a two-dimensional correlation filter to achieve joint estimation of the scale variation and in-plane rotation. Experiments performed on an OTB50 benchmark demonstrate that our tracker achieves superior performance against state-of-the-art trackers.
A Bayesian model averaging approach for estimating the relative risk of mortality associated with heat waves in 105 U.S. cities.

PubMed

Bobb, Jennifer F; Dominici, Francesca; Peng, Roger D

2011-12-01

Estimating the risks heat waves pose to human health is a critical part of assessing the future impact of climate change. In this article, we propose a flexible class of time series models to estimate the relative risk of mortality associated with heat waves and conduct Bayesian model averaging (BMA) to account for the multiplicity of potential models. Applying these methods to data from 105 U.S. cities for the period 1987-2005, we identify those cities having a high posterior probability of increased mortality risk during heat waves, examine the heterogeneity of the posterior distributions of mortality risk across cities, assess sensitivity of the results to the selection of prior distributions, and compare our BMA results to a model selection approach. Our results show that no single model best predicts risk across the majority of cities, and that for some cities heat-wave risk estimation is sensitive to model choice. Although model averaging leads to posterior distributions with increased variance as compared to statistical inference conditional on a model obtained through model selection, we find that the posterior mean of heat wave mortality risk is robust to accounting for model uncertainty over a broad class of models. © 2011, The International Biometric Society.
Smoothed Spectra, Ogives, and Error Estimates for Atmospheric Turbulence Data

NASA Astrophysics Data System (ADS)

Dias, Nelson Luís

2018-01-01

A systematic evaluation is conducted of the smoothed spectrum, which is a spectral estimate obtained by averaging over a window of contiguous frequencies. The technique is extended to the ogive, as well as to the cross-spectrum. It is shown that, combined with existing variance estimates for the periodogram, the variance—and therefore the random error—associated with these estimates can be calculated in a straightforward way. The smoothed spectra and ogives are biased estimates; with simple power-law analytical models, correction procedures are devised, as well as a global constraint that enforces Parseval's identity. Several new results are thus obtained: (1) The analytical variance estimates compare well with the sample variance calculated for the Bartlett spectrum and the variance of the inertial subrange of the cospectrum is shown to be relatively much larger than that of the spectrum. (2) Ogives and spectra estimates with reduced bias are calculated. (3) The bias of the smoothed spectrum and ogive is shown to be negligible at the higher frequencies. (4) The ogives and spectra thus calculated have better frequency resolution than the Bartlett spectrum, with (5) gradually increasing variance and relative error towards the low frequencies. (6) Power-law identification and extraction of the rate of dissipation of turbulence kinetic energy are possible directly from the ogive. (7) The smoothed cross-spectrum is a valid inner product and therefore an acceptable candidate for coherence and spectral correlation coefficient estimation by means of the Cauchy-Schwarz inequality. The quadrature, phase function, coherence function and spectral correlation function obtained from the smoothed spectral estimates compare well with the classical ones derived from the Bartlett spectrum.
An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

PubMed

Xu, Maoqi; Chen, Liang

2018-01-01

The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Trends in one-year cumulative incidence of death between 2005 and 2013 among patients initiating antiretroviral therapy in Uganda.

PubMed

Bebell, Lisa M; Siedner, Mark J; Musinguzi, Nicholas; Boum, Yap; Bwana, Bosco M; Muyindike, Winnie; Hunt, Peter W; Martin, Jeffrey N; Bangsberg, David R

2017-07-01

Recent ecological data demonstrate improving outcomes for HIV-infected people in sub-Saharan Africa. Recently, Uganda has experienced a resurgence in HIV incidence and prevalence, but trends in HIV-related deaths have not been well described. Data were collected through the Uganda AIDS Rural Treatment Outcomes (UARTO) Study, an observational longitudinal cohort of Ugandan adults initiating antiretroviral therapy (ART) between 2005 and 2013. We calculated cumulative incidence of death within one year of ART initiation, and fit Poisson models with robust variance estimators to estimate the effect enrollment period on one-year risk of death and loss to follow-up. Of 760 persons in UARTO who started ART, 30 deaths occurred within one year of ART initiation (cumulative incidence 3.9%, 95% confidence interval [CI] 2.7-5.6%). Risk of death was highest for those starting ART in 2005 (13.0%, 95% CI 6.0-24.0%), decreased in 2006-2007 to 4% (95% CI 2.0-6.0%), and did not change thereafter ( P = 0.61). These results were robust to adjustment for age, sex, CD4 cell count, viral load, asset wealth, baseline depression, and body mass index. Here, we demonstrate that one-year cumulative incidence of death was high just after free ART rollout, decreased the following year, and remained low thereafter. Once established, ART programs in President's Emergency Fund for AIDS Relief-supported countries can maintain high quality care.
Minimum variance geographic sampling

NASA Technical Reports Server (NTRS)

Terrell, G. R. (Principal Investigator)

1980-01-01

Resource inventories require samples with geographical scatter, sometimes not as widely spaced as would be hoped. A simple model of correlation over distances is used to create a minimum variance unbiased estimate population means. The fitting procedure is illustrated from data used to estimate Missouri corn acreage.
Detection of gene–environment interaction in pedigree data using genome-wide genotypes

PubMed Central

Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V

2016-01-01

Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits. PMID:27436263
A Robust High-Accuracy Ultrasound Indoor Positioning System Based on a Wireless Sensor Network

PubMed Central

Qi, Jun; Liu, Guo-Ping

2017-01-01

This paper describes the development and implementation of a robust high-accuracy ultrasonic indoor positioning system (UIPS). The UIPS consists of several wireless ultrasonic beacons in the indoor environment. Each of them has a fixed and known position coordinate and can collect all the transmissions from the target node or emit ultrasonic signals. Every wireless sensor network (WSN) node has two communication modules: one is WiFi, that transmits the data to the server, and the other is the radio frequency (RF) module, which is only used for time synchronization between different nodes, with accuracy up to 1 μs. The distance between the beacon and the target node is calculated by measuring the time-of-flight (TOF) for the ultrasonic signal, and then the position of the target is computed by some distances and the coordinate of the beacons. TOF estimation is the most important technique in the UIPS. A new time domain method to extract the envelope of the ultrasonic signals is presented in order to estimate the TOF. This method, with the envelope detection filter, estimates the value with the sampled values on both sides based on the least squares method (LSM). The simulation results show that the method can achieve envelope detection with a good filtering effect by means of the LSM. The highest precision and variance can reach 0.61 mm and 0.23 mm, respectively, in pseudo-range measurements with UIPS. A maximum location error of 10.2 mm is achieved in the positioning experiments for a moving robot, when UIPS works on the line-of-sight (LOS) signal. PMID:29113126
Behavior of sensitivities in the one-dimensional advection-dispersion equation: Implications for parameter estimation and sampling design

USGS Publications Warehouse

Knopman, Debra S.; Voss, Clifford I.

1987-01-01

The spatial and temporal variability of sensitivities has a significant impact on parameter estimation and sampling design for studies of solute transport in porous media. Physical insight into the behavior of sensitivities is offered through an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation. When parameters are estimated in regression models of one-dimensional transport, the spatial and temporal variability in sensitivities influences variance and covariance of parameter estimates. Several principles account for the observed influence of sensitivities on parameter uncertainty. (1) Information about a physical parameter may be most accurately gained at points in space and time with a high sensitivity to the parameter. (2) As the distance of observation points from the upstream boundary increases, maximum sensitivity to velocity during passage of the solute front increases and the consequent estimate of velocity tends to have lower variance. (3) The frequency of sampling must be “in phase” with the S shape of the dispersion sensitivity curve to yield the most information on dispersion. (4) The sensitivity to the dispersion coefficient is usually at least an order of magnitude less than the sensitivity to velocity. (5) The assumed probability distribution of random error in observations of solute concentration determines the form of the sensitivities. (6) If variance in random error in observations is large, trends in sensitivities of observation points may be obscured by noise and thus have limited value in predicting variance in parameter estimates among designs. (7) Designs that minimize the variance of one parameter may not necessarily minimize the variance of other parameters. (8) The time and space interval over which an observation point is sensitive to a given parameter depends on the actual values of the parameters in the underlying physical system.
A comparison of selection at list time and time-stratified sampling for estimating suspended sediment loads

Treesearch

Robert B. Thomas; Jack Lewis

1993-01-01

Time-stratified sampling of sediment for estimating suspended load is introduced and compared to selection at list time (SALT) sampling. Both methods provide unbiased estimates of load and variance. The magnitude of the variance of the two methods is compared using five storm populations of suspended sediment flux derived from turbidity data. Under like conditions,...
Estimation of the biserial correlation and its sampling variance for use in meta-analysis.

PubMed

Jacobs, Perke; Viechtbauer, Wolfgang

2017-06-01

Meta-analyses are often used to synthesize the findings of studies examining the correlational relationship between two continuous variables. When only dichotomous measurements are available for one of the two variables, the biserial correlation coefficient can be used to estimate the product-moment correlation between the two underlying continuous variables. Unlike the point-biserial correlation coefficient, biserial correlation coefficients can therefore be integrated with product-moment correlation coefficients in the same meta-analysis. The present article describes the estimation of the biserial correlation coefficient for meta-analytic purposes and reports simulation results comparing different methods for estimating the coefficient's sampling variance. The findings indicate that commonly employed methods yield inconsistent estimates of the sampling variance across a broad range of research situations. In contrast, consistent estimates can be obtained using two methods that appear to be unknown in the meta-analytic literature. A variance-stabilizing transformation for the biserial correlation coefficient is described that allows for the construction of confidence intervals for individual coefficients with close to nominal coverage probabilities in most of the examined conditions. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.

PubMed

Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

PubMed Central

Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470
Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates.

PubMed

LeDell, Erin; Petersen, Maya; van der Laan, Mark

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

PubMed Central

Petersen, Maya; van der Laan, Mark

2015-01-01

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC. PMID:26279737
Analysis of genetic effects of nuclear-cytoplasmic interaction on quantitative traits: genetic model for diploid plants.

PubMed

Han, Lide; Yang, Jian; Zhu, Jun

2007-06-01

A genetic model was proposed for simultaneously analyzing genetic effects of nuclear, cytoplasm, and nuclear-cytoplasmic interaction (NCI) as well as their genotype by environment (GE) interaction for quantitative traits of diploid plants. In the model, the NCI effects were further partitioned into additive and dominance nuclear-cytoplasmic interaction components. Mixed linear model approaches were used for statistical analysis. On the basis of diallel cross designs, Monte Carlo simulations showed that the genetic model was robust for estimating variance components under several situations without specific effects. Random genetic effects were predicted by an adjusted unbiased prediction (AUP) method. Data on four quantitative traits (boll number, lint percentage, fiber length, and micronaire) in Upland cotton (Gossypium hirsutum L.) were analyzed as a worked example to show the effectiveness of the model.
Toward a More Robust Pruning Procedure for MLP Networks

NASA Technical Reports Server (NTRS)

Stepniewski, Slawomir W.; Jorgensen, Charles C.

1998-01-01

Choosing a proper neural network architecture is a problem of great practical importance. Smaller models mean not only simpler designs but also lower variance for parameter estimation and network prediction. The widespread utilization of neural networks in modeling highlights an issue in human factors. The procedure of building neural models should find an appropriate level of model complexity in a more or less automatic fashion to make it less prone to human subjectivity. In this paper we present a Singular Value Decomposition based node elimination technique and enhanced implementation of the Optimal Brain Surgeon algorithm. Combining both methods creates a powerful pruning engine that can be used for tuning feedforward connectionist models. The performance of the proposed method is demonstrated by adjusting the structure of a multi-input multi-output model used to calibrate a six-component wind tunnel strain gage.
Just Google It: An Approach on Word Frequencies Based on Online Search Result.

PubMed

Moret-Tatay, Carmen; Gamermann, Daniel; Murphy, Michael; Kuzmičová, Anezka

2018-01-01

Word frequency is one of the most robust factors in the literature on word processing, based on the lexical corpus of a language. However, different sources might be used in order to determine the actual frequency of each word. Recent research has determined frequencies based on movie subtitles, Twitter, blog posts, or newspapers. In this paper, we examine a determination of these frequencies based on the World Wide Web. For this purpose, a Python script was developed to obtain frequencies of a word through online search results. These frequencies were employed to estimate lexical decision times in comparison to the traditional frequencies in a lexical decision task. It was found that the Google frequencies predict reaction times comparably to the traditional frequencies. Still, the explained variance was higher for the traditional database.
The Diesel Exhaust in Miners Study: V. Evaluation of the Exposure Assessment Methods

PubMed Central

Stewart, Patricia A.; Vermeulen, Roel; Coble, Joseph B.; Blair, Aaron; Schleiff, Patricia; Lubin, Jay H.; Attfield, Mike; Silverman, Debra T.

2012-01-01

Exposure to respirable elemental carbon (REC), a component of diesel exhaust (DE), was assessed for an epidemiologic study investigating the association between DE and mortality, particularly from lung cancer, among miners at eight mining facilities from the date of dieselization (1947–1967) through 1997. To provide insight into the quality of the estimates for use in the epidemiologic analyses, several approaches were taken to evaluate the exposure assessment process and the quality of the estimates. An analysis of variance was conducted to evaluate the variability of 1998–2001 REC measurements within and between exposure groups of underground jobs. Estimates for the surface exposure groups were evaluated to determine if the arithmetic means (AMs) of the REC measurements increased with increased proximity to, or use of, diesel-powered equipment, which was the basis on which the surface groups were formed. Estimates of carbon monoxide (CO) (another component of DE) air concentrations in 1976–1977, derived from models developed to predict estimated historical exposures, were compared to 1976–1977 CO measurement data that had not been used in the model development. Alternative sets of estimates were developed to investigate the robustness of various model assumptions. These estimates were based on prediction models using: (i) REC medians rather AMs, (ii) a different CO:REC proportionality than a 1:1 relation, and (iii) 5-year averages of historical CO measurements rather than modeled historical CO measurements and DE-related determinants. The analysis of variance found that in three of the facilities, most of the between-group variability in the underground measurements was explained by the use of job titles. There was relatively little between-group variability in the other facilities. The estimated REC AMs for the surface exposure groups rose overall from 1 to 5 μg m−3 as proximity to, and use of, diesel equipment increased. The alternative estimates overall were highly correlated (∼0.9) with the primary set of estimates. The median of the relative differences between the 1976–1977 CO measurement means and the 1976–1977 estimates for six facilities was 29%. Comparison of estimated CO air concentrations from the facility-specific prediction models with historical CO measurement data found an overall agreement similar to that observed in other epidemiologic studies. Other evaluations of components of the exposure assessment process found moderate to excellent agreement. Thus, the overall evidence suggests that the estimates were likely accurate representations of historical personal exposure levels to DE and are useful for epidemiologic analyses. PMID:22383674
A review of statistical estimators for risk-adjusted length of stay: analysis of the Australian and new Zealand Intensive Care Adult Patient Data-Base, 2008-2009.

PubMed

Moran, John L; Solomon, Patricia J

2012-05-16

For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, a number of statistical estimators have been proposed as alternatives to the traditional ordinary least squares (OLS) regression with log dependent variable. Using a cohort of patients identified in the Australian and New Zealand Intensive Care Society Adult Patient Database, 2008-2009, 12 different methods were used for estimation of intensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis of firstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal and skew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models [GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial and inverse-Gaussian], extended estimating equations [EEE] and a finite mixture model including a gamma distribution. A fixed covariate list and ICU-site clustering with robust variance were utilised for model fitting with split-sample determination (80%) and validation (20%) data sets, and model simulation was undertaken to establish over-fitting (Copas test). Indices of model specification using Bayesian information criterion [BIC: lower values preferred] and residual analysis as well as predictive performance (R2, concordance correlation coefficient (CCC), mean absolute error [MAE]) were established for each estimator. The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8) years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%. ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated marked kurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from a maximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22 (LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superior residual behaviour was established for the log-scale estimators. There was a general tendency for over-prediction (negative residuals) and for over-fitting, the exception being the GLM negative binomial estimator. The mean-variance function was best approximated by a quadratic function, consistent with log-scale estimation; the link function was estimated (EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function. For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the most consistently performing estimator(s). Neither the GLM variants nor the skew-regression estimators dominated.
Overlap between treatment and control distributions as an effect size measure in experiments.

PubMed

Hedges, Larry V; Olkin, Ingram

2016-03-01

The proportion π of treatment group observations that exceed the control group mean has been proposed as an effect size measure for experiments that randomly assign independent units into 2 groups. We give the exact distribution of a simple estimator of π based on the standardized mean difference and use it to study the small sample bias of this estimator. We also give the minimum variance unbiased estimator of π under 2 models, one in which the variance of the mean difference is known and one in which the variance is unknown. We show how to use the relation between the standardized mean difference and the overlap measure to compute confidence intervals for π and show that these results can be used to obtain unbiased estimators, large sample variances, and confidence intervals for 3 related effect size measures based on the overlap. Finally, we show how the effect size π can be used in a meta-analysis. (c) 2016 APA, all rights reserved).
Adaptive Green-Kubo estimates of transport coefficients from molecular dynamics based on robust error analysis.

PubMed

Jones, Reese E; Mandadapu, Kranthi K

2012-04-21

We present a rigorous Green-Kubo methodology for calculating transport coefficients based on on-the-fly estimates of: (a) statistical stationarity of the relevant process, and (b) error in the resulting coefficient. The methodology uses time samples efficiently across an ensemble of parallel replicas to yield accurate estimates, which is particularly useful for estimating the thermal conductivity of semi-conductors near their Debye temperatures where the characteristic decay times of the heat flux correlation functions are large. Employing and extending the error analysis of Zwanzig and Ailawadi [Phys. Rev. 182, 280 (1969)] and Frenkel [in Proceedings of the International School of Physics "Enrico Fermi", Course LXXV (North-Holland Publishing Company, Amsterdam, 1980)] to the integral of correlation, we are able to provide tight theoretical bounds for the error in the estimate of the transport coefficient. To demonstrate the performance of the method, four test cases of increasing computational cost and complexity are presented: the viscosity of Ar and water, and the thermal conductivity of Si and GaN. In addition to producing accurate estimates of the transport coefficients for these materials, this work demonstrates precise agreement of the computed variances in the estimates of the correlation and the transport coefficient with the extended theory based on the assumption that fluctuations follow a Gaussian process. The proposed algorithm in conjunction with the extended theory enables the calculation of transport coefficients with the Green-Kubo method accurately and efficiently.
Adaptive Green-Kubo estimates of transport coefficients from molecular dynamics based on robust error analysis

NASA Astrophysics Data System (ADS)

Jones, Reese E.; Mandadapu, Kranthi K.

2012-04-01

We present a rigorous Green-Kubo methodology for calculating transport coefficients based on on-the-fly estimates of: (a) statistical stationarity of the relevant process, and (b) error in the resulting coefficient. The methodology uses time samples efficiently across an ensemble of parallel replicas to yield accurate estimates, which is particularly useful for estimating the thermal conductivity of semi-conductors near their Debye temperatures where the characteristic decay times of the heat flux correlation functions are large. Employing and extending the error analysis of Zwanzig and Ailawadi [Phys. Rev. 182, 280 (1969)], 10.1103/PhysRev.182.280 and Frenkel [in Proceedings of the International School of Physics "Enrico Fermi", Course LXXV (North-Holland Publishing Company, Amsterdam, 1980)] to the integral of correlation, we are able to provide tight theoretical bounds for the error in the estimate of the transport coefficient. To demonstrate the performance of the method, four test cases of increasing computational cost and complexity are presented: the viscosity of Ar and water, and the thermal conductivity of Si and GaN. In addition to producing accurate estimates of the transport coefficients for these materials, this work demonstrates precise agreement of the computed variances in the estimates of the correlation and the transport coefficient with the extended theory based on the assumption that fluctuations follow a Gaussian process. The proposed algorithm in conjunction with the extended theory enables the calculation of transport coefficients with the Green-Kubo method accurately and efficiently.
Robust Tracking of Small Displacements with a Bayesian Estimator

PubMed Central

Dumont, Douglas M.; Byram, Brett C.

2016-01-01

Radiation-force-based elasticity imaging describes a group of techniques that use acoustic radiation force (ARF) to displace tissue in order to obtain qualitative or quantitative measurements of tissue properties. Because ARF-induced displacements are on the order of micrometers, tracking these displacements in vivo can be challenging. Previously, it has been shown that Bayesian-based estimation can overcome some of the limitations of a traditional displacement estimator like normalized cross-correlation (NCC). In this work, we describe a Bayesian framework that combines a generalized Gaussian-Markov random field (GGMRF) prior with an automated method for selecting the prior’s width. We then evaluate its performance in the context of tracking the micrometer-order displacements encountered in an ARF-based method like acoustic radiation force impulse (ARFI) imaging. The results show that bias, variance, and mean-square error performance vary with prior shape and width, and that an almost one order-of-magnitude reduction in mean-square error can be achieved by the estimator at the automatically-selected prior width. Lesion simulations show that the proposed estimator has a higher contrast-to-noise ratio but lower contrast than NCC, median-filtered NCC, and the previous Bayesian estimator, with a non-Gaussian prior shape having better lesion-edge resolution than a Gaussian prior. In vivo results from a cardiac, radiofrequency ablation ARFI imaging dataset show quantitative improvements in lesion contrast-to-noise ratio over NCC as well as the previous Bayesian estimator. PMID:26529761
A framework for the meta-analysis of Bland-Altman studies based on a limits of agreement approach.

PubMed

Tipton, Elizabeth; Shuster, Jonathan

2017-10-15

Bland-Altman method comparison studies are common in the medical sciences and are used to compare a new measure to a gold-standard (often costlier or more invasive) measure. The distribution of these differences is summarized by two statistics, the 'bias' and standard deviation, and these measures are combined to provide estimates of the limits of agreement (LoA). When these LoA are within the bounds of clinically insignificant differences, the new non-invasive measure is preferred. Very often, multiple Bland-Altman studies have been conducted comparing the same two measures, and random-effects meta-analysis provides a means to pool these estimates. We provide a framework for the meta-analysis of Bland-Altman studies, including methods for estimating the LoA and measures of uncertainty (i.e., confidence intervals). Importantly, these LoA are likely to be wider than those typically reported in Bland-Altman meta-analyses. Frequently, Bland-Altman studies report results based on repeated measures designs but do not properly adjust for this design in the analysis. Meta-analyses of Bland-Altman studies frequently exclude these studies for this reason. We provide a meta-analytic approach that allows inclusion of estimates from these studies. This includes adjustments to the estimate of the standard deviation and a method for pooling the estimates based upon robust variance estimation. An example is included based on a previously published meta-analysis. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Technical Note: Introduction of variance component analysis to setup error analysis in radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matsuo, Yukinori, E-mail: ymatsuo@kuhp.kyoto-u.ac.

Purpose: The purpose of this technical note is to introduce variance component analysis to the estimation of systematic and random components in setup error of radiotherapy. Methods: Balanced data according to the one-factor random effect model were assumed. Results: Analysis-of-variance (ANOVA)-based computation was applied to estimate the values and their confidence intervals (CIs) for systematic and random errors and the population mean of setup errors. The conventional method overestimates systematic error, especially in hypofractionated settings. The CI for systematic error becomes much wider than that for random error. The ANOVA-based estimation can be extended to a multifactor model considering multiplemore » causes of setup errors (e.g., interpatient, interfraction, and intrafraction). Conclusions: Variance component analysis may lead to novel applications to setup error analysis in radiotherapy.« less
An efficient sampling approach for variance-based sensitivity analysis based on the law of total variance in the successive intervals without overlapping

NASA Astrophysics Data System (ADS)

Yun, Wanying; Lu, Zhenzhou; Jiang, Xian

2018-06-01

To efficiently execute the variance-based global sensitivity analysis, the law of total variance in the successive intervals without overlapping is proved at first, on which an efficient space-partition sampling-based approach is subsequently proposed in this paper. Through partitioning the sample points of output into different subsets according to different inputs, the proposed approach can efficiently evaluate all the main effects concurrently by one group of sample points. In addition, there is no need for optimizing the partition scheme in the proposed approach. The maximum length of subintervals is decreased by increasing the number of sample points of model input variables in the proposed approach, which guarantees the convergence condition of the space-partition approach well. Furthermore, a new interpretation on the thought of partition is illuminated from the perspective of the variance ratio function. Finally, three test examples and one engineering application are employed to demonstrate the accuracy, efficiency and robustness of the proposed approach.
Multistep estimators of the between-study variance: The relationship with the Paule-Mandel estimator.

PubMed

van Aert, Robbie C M; Jackson, Dan

2018-04-26

A wide variety of estimators of the between-study variance are available in random-effects meta-analysis. Many, but not all, of these estimators are based on the method of moments. The DerSimonian-Laird estimator is widely used in applications, but the Paule-Mandel estimator is an alternative that is now recommended. Recently, DerSimonian and Kacker have developed two-step moment-based estimators of the between-study variance. We extend these two-step estimators so that multiple (more than two) steps are used. We establish the surprising result that the multistep estimator tends towards the Paule-Mandel estimator as the number of steps becomes large. Hence, the iterative scheme underlying our new multistep estimator provides a hitherto unknown relationship between two-step estimators and Paule-Mandel estimator. Our analysis suggests that two-step estimators are not necessarily distinct estimators in their own right; instead, they are quantities that are closely related to the usual iterative scheme that is used to calculate the Paule-Mandel estimate. The relationship that we establish between the multistep and Paule-Mandel estimator is another justification for the use of the latter estimator. Two-step and multistep estimators are perhaps best conceptualized as approximate Paule-Mandel estimators. © 2018 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Analysis of conditional genetic effects and variance components in developmental genetics.

PubMed

Zhu, J

1995-12-01

A genetic model with additive-dominance effects and genotype x environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t-1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects.
Analysis of Conditional Genetic Effects and Variance Components in Developmental Genetics

PubMed Central

Zhu, J.

1995-01-01

A genetic model with additive-dominance effects and genotype X environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t - 1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects. PMID:8601500
Biochemical Phenotypes to Discriminate Microbial Subpopulations and Improve Outbreak Detection

PubMed Central

Galar, Alicia; Kulldorff, Martin; Rudnick, Wallis; O'Brien, Thomas F.; Stelling, John

2013-01-01

Background Clinical microbiology laboratories worldwide constitute an invaluable resource for monitoring emerging threats and the spread of antimicrobial resistance. We studied the growing number of biochemical tests routinely performed on clinical isolates to explore their value as epidemiological markers. Methodology/Principal Findings Microbiology laboratory results from January 2009 through December 2011 from a 793-bed hospital stored in WHONET were examined. Variables included patient location, collection date, organism, and 47 biochemical and 17 antimicrobial susceptibility test results reported by Vitek 2. To identify biochemical tests that were particularly valuable (stable with repeat testing, but good variability across the species) or problematic (inconsistent results with repeat testing), three types of variance analyses were performed on isolates of K. pneumonia: descriptive analysis of discordant biochemical results in same-day isolates, an average within-patient variance index, and generalized linear mixed model variance component analysis. Results: 4,200 isolates of K. pneumoniae were identified from 2,485 patients, 32% of whom had multiple isolates. The first two variance analyses highlighted SUCT, TyrA, GlyA, and GGT as “nuisance” biochemicals for which discordant within-patient test results impacted a high proportion of patient results, while dTAG had relatively good within-patient stability with good heterogeneity across the species. Variance component analyses confirmed the relative stability of dTAG, and identified additional biochemicals such as PHOS with a large between patient to within patient variance ratio. A reduced subset of biochemicals improved the robustness of strain definition for carbapenem-resistant K. pneumoniae. Surveillance analyses suggest that the reduced biochemical profile could improve the timeliness and specificity of outbreak detection algorithms. Conclusions The statistical approaches explored can improve the robust recognition of microbial subpopulations with routinely available biochemical test results, of value in the timely detection of outbreak clones and evolutionarily important genetic events. PMID:24391936
Automatic segmentation for brain MR images via a convex optimized segmentation and bias field correction coupled model.

PubMed

Chen, Yunjie; Zhao, Bo; Zhang, Jianwei; Zheng, Yuhui

2014-09-01

Accurate segmentation of magnetic resonance (MR) images remains challenging mainly due to the intensity inhomogeneity, which is also commonly known as bias field. Recently active contour models with geometric information constraint have been applied, however, most of them deal with the bias field by using a necessary pre-processing step before segmentation of MR data. This paper presents a novel automatic variational method, which can segment brain MR images meanwhile correcting the bias field when segmenting images with high intensity inhomogeneities. We first define a function for clustering the image pixels in a smaller neighborhood. The cluster centers in this objective function have a multiplicative factor that estimates the bias within the neighborhood. In order to reduce the effect of the noise, the local intensity variations are described by the Gaussian distributions with different means and variances. Then, the objective functions are integrated over the entire domain. In order to obtain the global optimal and make the results independent of the initialization of the algorithm, we reconstructed the energy function to be convex and calculated it by using the Split Bregman theory. A salient advantage of our method is that its result is independent of initialization, which allows robust and fully automated application. Our method is able to estimate the bias of quite general profiles, even in 7T MR images. Moreover, our model can also distinguish regions with similar intensity distribution with different variances. The proposed method has been rigorously validated with images acquired on variety of imaging modalities with promising results. Copyright © 2014 Elsevier Inc. All rights reserved.
Estimators for Two Measures of Association for Set Correlation.

ERIC Educational Resources Information Center

Cohen, Jacob; Nee, John C. M.

1984-01-01

Two measures of association between sets of variables have been proposed for set correlation: the proportion of generalized variance, and the proportion of additionive variance. Because these measures are strongly positively biased, approximate expected values and estimators of these measures are derived and checked. (Author/BW)
Robust estimation approach for blind denoising.

PubMed

Rabie, Tamer

2005-11-01

This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.

Estimation of within-stratum variance for sample allocation: Foreign commodity production forecasting

NASA Technical Reports Server (NTRS)

Chhikara, R. S.; Perry, C. R., Jr. (Principal Investigator)

1980-01-01

The problem of determining the stratum variances required for an optimum sample allocation for remotely sensed crop surveys is investigated with emphasis on an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical statistics is developed for obtaining initial estimates of stratum variances. The procedure is applied to variance for wheat in the U.S. Great Plains and is evaluated based on the numerical results obtained. It is shown that the proposed technique is viable and performs satisfactorily with the use of a conservative value (smaller than the expected value) for the field size and with the use of crop statistics from the small political division level.
Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle.

PubMed

Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

2016-12-01

The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.
Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle

PubMed Central

Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

2016-01-01

The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192
Advanced Communication Processing Techniques Held in Ruidoso, New Mexico on 14-17 May 1989

DTIC Science & Technology

1990-01-01

Criteria: * Prob. of Detection and False Alarm * Variances of Parameter Estimators * Prob. of Correct Classiflcsation and Rejection 0 2 In the exposure...couple of criteria. The tell? [LAUGHTER] If it was anybody else, I standard Neyman-Pearson approach for de- wouldn’t say .... tection, variances for... VARIANCE AISJ11T UPPER AND0 LOWER PMIOUIESOES FEATUE---OELET!U FETUA1E----WW-4A140 TIME SEOLIENTIAL CORRELATION FEATUE -$-ESTIMATED INA FEATURE-ID--LOW
Evaluation and recommendation of sensitivity analysis methods for application to Stochastic Human Exposure and Dose Simulation models.

PubMed

Mokhtari, Amirhossein; Christopher Frey, H; Zheng, Junyu

2006-11-01

Sensitivity analyses of exposure or risk models can help identify the most significant factors to aid in risk management or to prioritize additional research to reduce uncertainty in the estimates. However, sensitivity analysis is challenged by non-linearity, interactions between inputs, and multiple days or time scales. Selected sensitivity analysis methods are evaluated with respect to their applicability to human exposure models with such features using a testbed. The testbed is a simplified version of a US Environmental Protection Agency's Stochastic Human Exposure and Dose Simulation (SHEDS) model. The methods evaluated include the Pearson and Spearman correlation, sample and rank regression, analysis of variance, Fourier amplitude sensitivity test (FAST), and Sobol's method. The first five methods are known as "sampling-based" techniques, wheras the latter two methods are known as "variance-based" techniques. The main objective of the test cases was to identify the main and total contributions of individual inputs to the output variance. Sobol's method and FAST directly quantified these measures of sensitivity. Results show that sensitivity of an input typically changed when evaluated under different time scales (e.g., daily versus monthly). All methods provided similar insights regarding less important inputs; however, Sobol's method and FAST provided more robust insights with respect to sensitivity of important inputs compared to the sampling-based techniques. Thus, the sampling-based methods can be used in a screening step to identify unimportant inputs, followed by application of more computationally intensive refined methods to a smaller set of inputs. The implications of time variation in sensitivity results for risk management are briefly discussed.
Improved uncertainty quantification in nondestructive assay for nonproliferation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burr, Tom; Croft, Stephen; Jarman, Ken

2016-12-01

This paper illustrates methods to improve uncertainty quantification (UQ) for non-destructive assay (NDA) measurements used in nuclear nonproliferation. First, it is shown that current bottom-up UQ applied to calibration data is not always adequate, for three main reasons: (1) Because there are errors in both the predictors and the response, calibration involves a ratio of random quantities, and calibration data sets in NDA usually consist of only a modest number of samples (3–10); therefore, asymptotic approximations involving quantities needed for UQ such as means and variances are often not sufficiently accurate; (2) Common practice overlooks that calibration implies a partitioningmore » of total error into random and systematic error, and (3) In many NDA applications, test items exhibit non-negligible departures in physical properties from calibration items, so model-based adjustments are used, but item-specific bias remains in some data. Therefore, improved bottom-up UQ using calibration data should predict the typical magnitude of item-specific bias, and the suggestion is to do so by including sources of item-specific bias in synthetic calibration data that is generated using a combination of modeling and real calibration data. Second, for measurements of the same nuclear material item by both the facility operator and international inspectors, current empirical (top-down) UQ is described for estimating operator and inspector systematic and random error variance components. A Bayesian alternative is introduced that easily accommodates constraints on variance components, and is more robust than current top-down methods to the underlying measurement error distributions.« less
The Influence of Major Life Events on Economic Attitudes in a World of Gene-Environment Interplay

PubMed Central

Hatemi, Peter K.

2014-01-01

The role of “genes” on political attitudes has gained attention across disciplines. However, person-specific experiences have yet to be incorporated into models that consider genetic influences. Relying on a gene-environment interplay approach, this study explicates how life-events, such as losing one’s job or suffering a financial loss, influence economic policy attitudes. The results indicate genetic and environmental variance on support for unions, immigration, capitalism, socialism and property tax is moderated by financial risks. Changes in the magnitude of genetic influences, however, are temporary. After two years, the phenotypic effects of the life events remain on most attitudes, but changes in the sources of individual differences do not. Univariate twin models that estimate the independent contributions of genes and environment on the variation of attitudes appear to provide robust baseline indicators of sources of individual differences. These estimates, however, are not event or day specific. In this way, genetic influences add stability, while environment cues change, and this process is continually updated. PMID:24860199
Meta-analysis in clinical trials revisited.

PubMed

DerSimonian, Rebecca; Laird, Nan

2015-11-01

In this paper, we revisit a 1986 article we published in this Journal, Meta-Analysis in Clinical Trials, where we introduced a random-effects model to summarize the evidence about treatment efficacy from a number of related clinical trials. Because of its simplicity and ease of implementation, our approach has been widely used (with more than 12,000 citations to date) and the "DerSimonian and Laird method" is now often referred to as the 'standard approach' or a 'popular' method for meta-analysis in medical and clinical research. The method is especially useful for providing an overall effect estimate and for characterizing the heterogeneity of effects across a series of studies. Here, we review the background that led to the original 1986 article, briefly describe the random-effects approach for meta-analysis, explore its use in various settings and trends over time and recommend a refinement to the method using a robust variance estimator for testing overall effect. We conclude with a discussion of repurposing the method for Big Data meta-analysis and Genome Wide Association Studies for studying the importance of genetic variants in complex diseases. Published by Elsevier Inc.
Meta-Analysis in Clinical Trials Revisited

PubMed Central

Laird, Nan

2015-01-01

In this paper, we revisit a 1986 article we published in this Journal, Meta-Analysis in Clinical Trials, where we introduced a random-effect model to summarize the evidence about treatment efficacy from a number of related clinical trials. Because of its simplicity and ease of implementation, our approach has been widely used (with more than 12,000 citations to date) and the “DerSimonian and Laird method” is now often referred to as the ‘standard approach’ or a ‘popular’ method for meta-analysis in medical and clinical research. The method is especially useful for providing an overall effect estimate and for characterizing the heterogeneity of effects across a series of studies. Here, we review the background that led to the original 1986 article, briefly describe the random-effects approach for meta-analysis, explore its use in various settings and trends over time and recommend a refinement to the method using a robust variance estimator for testing overall effect. We conclude with a discussion of repurposing the method for Big Data meta-analysis and Genome Wide Association Studies for studying the importance of genetic variants in complex diseases. PMID:26343745
Maternal socioeconomic factors and adverse perinatal outcomes in two birth cohorts, 1997/98 and 2010, in São Luís, Brazil.

PubMed

Cavalcante, Nádia Carenina Nunes; Simões, Vanda Maria Ferreira; Ribeiro, Marizélia Rodrigues Costa; Lamy-Filho, Fernando; Barbieri, Marco Antonio; Bettiol, Heloisa; Silva, Antônio Augusto Moura da

2017-01-01

Several studies have identified social inequalities in low birth weight (LBW), preterm birth (PTB), and intrauterine growth restriction (IUGR), which, in recent years, have diminished or disappeared in certain locations. Estimate the LBW, PTB, and IUGR rates in São Luís, Maranhão, Brazil, in 2010, and check for associations between socioeconomic factors and these indicators. This study is based on a birth cohort performed in São Luís. It included 5,051 singleton hospital births in 2010. The chi-square test was used for proportion comparisons, while simple and multiple Poisson regression models with robust error variance were used to estimate relative risks. LBW, PTB and IUGR rates were 7.5, 12.2, and 10.3% respectively. LBW was higher in low-income families, while PTB and IUGR were not associated with socioeconomic factors. The absence or weak association of these indicators with social inequality point to improvements in health care and/or in social conditions in São Luís.
Risks of Death and Severe Disease in Patients With Middle East Respiratory Syndrome Coronavirus, 2012-2015.

PubMed

Rivers, Caitlin M; Majumder, Maimuna S; Lofgren, Eric T

2016-09-15

Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging pathogen, first recognized in 2012, with a high case fatality risk, no vaccine, and no treatment beyond supportive care. We estimated the relative risks of death and severe disease among MERS-CoV patients in the Middle East between 2012 and 2015 for several risk factors, using Poisson regression with robust variance and a bootstrap-based expectation maximization algorithm to handle extensive missing data. Increased age and underlying comorbidity were risk factors for both death and severe disease, while cases arising in Saudi Arabia were more likely to be severe. Cases occurring later in the emergence of MERS-CoV and among health-care workers were less serious. This study represents an attempt to estimate risk factors for an emerging infectious disease using open data and to address some of the uncertainty surrounding MERS-CoV epidemiology. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A Simple Method for Deriving the Confidence Regions for the Penalized Cox’s Model via the Minimand Perturbation†

PubMed Central

Lin, Chen-Yen; Halabi, Susan

2017-01-01

We propose a minimand perturbation method to derive the confidence regions for the regularized estimators for the Cox’s proportional hazards model. Although the regularized estimation procedure produces a more stable point estimate, it remains challenging to provide an interval estimator or an analytic variance estimator for the associated point estimate. Based on the sandwich formula, the current variance estimator provides a simple approximation, but its finite sample performance is not entirely satisfactory. Besides, the sandwich formula can only provide variance estimates for the non-zero coefficients. In this article, we present a generic description for the perturbation method and then introduce a computation algorithm using the adaptive least absolute shrinkage and selection operator (LASSO) penalty. Through simulation studies, we demonstrate that our method can better approximate the limiting distribution of the adaptive LASSO estimator and produces more accurate inference compared with the sandwich formula. The simulation results also indicate the possibility of extending the applications to the adaptive elastic-net penalty. We further demonstrate our method using data from a phase III clinical trial in prostate cancer. PMID:29326496
A Simple Method for Deriving the Confidence Regions for the Penalized Cox's Model via the Minimand Perturbation.

PubMed

Lin, Chen-Yen; Halabi, Susan

2017-01-01

We propose a minimand perturbation method to derive the confidence regions for the regularized estimators for the Cox's proportional hazards model. Although the regularized estimation procedure produces a more stable point estimate, it remains challenging to provide an interval estimator or an analytic variance estimator for the associated point estimate. Based on the sandwich formula, the current variance estimator provides a simple approximation, but its finite sample performance is not entirely satisfactory. Besides, the sandwich formula can only provide variance estimates for the non-zero coefficients. In this article, we present a generic description for the perturbation method and then introduce a computation algorithm using the adaptive least absolute shrinkage and selection operator (LASSO) penalty. Through simulation studies, we demonstrate that our method can better approximate the limiting distribution of the adaptive LASSO estimator and produces more accurate inference compared with the sandwich formula. The simulation results also indicate the possibility of extending the applications to the adaptive elastic-net penalty. We further demonstrate our method using data from a phase III clinical trial in prostate cancer.
Comparing Students With and Without Reading Difficulties on Reading Comprehension Assessments: A Meta-Analysis.

PubMed

Collins, Alyson A; Lindström, Esther R; Compton, Donald L

Researchers have increasingly investigated sources of variance in reading comprehension test scores, particularly with students with reading difficulties (RD). The purpose of this meta-analysis was to determine if the achievement gap between students with RD and typically developing (TD) students varies as a function of different reading comprehension response formats (e.g., multiple choice, cloze). A systematic literature review identified 82 eligible studies. All studies administered reading comprehension assessments to students with RD and TD students in Grades K-12. Hedge's g standardized mean difference effect sizes were calculated, and random effects robust variance estimation techniques were used to aggregate average weighted effect sizes for each response format. Results indicated that the achievement gap between students with RD and TD students was larger for some response formats (e.g., picture selection ES g = -1.80) than others (e.g., retell ES g = -0.60). Moreover, for multiple-choice, cloze, and open-ended question response formats, single-predictor metaregression models explored potential moderators of heterogeneity in effect sizes. No clear patterns, however, emerged in regard to moderators of heterogeneity in effect sizes across response formats. Findings suggest that the use of different response formats may lead to variability in the achievement gap between students with RD and TD students.
Effects of after-school programs with at-risk youth on attendance and externalizing behaviors: a systematic review and meta-analysis.

PubMed

Kremer, Kristen P; Maynard, Brandy R; Polanin, Joshua R; Vaughn, Michael G; Sarteschi, Christine M

2015-03-01

The popularity, demand, and increased federal and private funding for after-school programs have resulted in a marked increase in after-school programs over the past two decades. After-school programs are used to prevent adverse outcomes, decrease risks, or improve functioning with at-risk youth in several areas, including academic achievement, crime and behavioral problems, socio-emotional functioning, and school engagement and attendance; however, the evidence of effects of after-school programs remains equivocal. This systematic review and meta-analysis, following Campbell Collaboration guidelines, examined the effects of after-school programs on externalizing behaviors and school attendance with at-risk students. A systematic search for published and unpublished literature resulted in the inclusion of 24 studies. A total of 64 effect sizes (16 for attendance outcomes; 49 for externalizing behavior outcomes) extracted from 31 reports were included in the meta-analysis using robust variance estimation to handle dependencies among effect sizes. Mean effects were small and non-significant for attendance and externalizing behaviors. A moderate to large amount of heterogeneity was present; however, no moderator variable tested explained the variance between studies. Significant methodological shortcomings were identified across the corpus of studies included in this review. Implications for practice, policy and research are discussed.
A longitudinal study of posttraumatic stress symptoms and their predictors in rescue workers after a firework factory disaster.

PubMed

Ask, Elklit; Gudmundsdottir, Drifa

2014-01-01

This is a follow up study on rescue workers participating in the primary rescue during and immediately after the explosion of a firework factory. We aimed to estimate the possible PTSD prevalence at five and 18 months post disaster, determining if the level of PTSD symptoms at 18 months could be predicted from factors measured at five months. We included measures of posttraumatic symptoms, social support, locus of control and demographic questions. The possible PTSD prevalence rose from 1.6% (n = 465) at five months post disaster to 3.1% (n = 130) at 18 months. A hierarchical linear regression predicted 59% of PTSD symptoms variance at 18 months post disaster. In the final regression, somatization explained the greatest part of the symptom variance (42%), followed by locus of control (29%) and major life events prior to and right after the disaster (23%). Rescue workers seemed to be relatively robust to traumatic exposure: The prevalence of possible PTSD in our study was even lower than previous studies, probably because of the less severe consequences of the disaster studied. Furthermore, we found that PTSD symptom level at 18 months post disaster was highly predicted by psychological factors, particularly by somatization. However, further investigations of traumatic responding are required in this population.
Effects of After-School Programs with At-Risk Youth on Attendance and Externalizing Behaviors: A Systematic Review and Meta-Analysis

PubMed Central

Maynard, Brandy R.; Polanin, Joshua R.; Vaughn, Michael G.; Sarteschi, Christine M.

2015-01-01

The popularity, demand, and increased federal and private funding for after-school programs have resulted in a marked increase in after-school programs over the past two decades. After-school programs are used to prevent adverse outcomes, decrease risks, or improve functioning with at-risk youth in several areas, including academic achievement, crime and behavioral problems, socio-emotional functioning, and school engagement and attendance; however, the evidence of effects of after-school programs remains equivocal. This systematic review and meta-analysis, following Campbell Collaboration guidelines, examined the effects of after-school programs on externalizing behaviors and school attendance with at-risk students. A systematic search for published and unpublished literature resulted in the inclusion of 24 studies. A total of 64 effect sizes (16 for attendance outcomes; 49 for externalizing behavior outcomes) extracted from 31 reports were included in the meta-analysis using robust variance estimation to handle dependencies among effect sizes. Mean effects were small and non-significant for attendance and externalizing behaviors. A moderate to large amount of heterogeneity was present; however, no moderator variable tested explained the variance between studies. Significant methodological shortcomings were identified across the corpus of studies included in this review. Implications for practice, policy and research are discussed. PMID:25416228
Variance Estimation Using Replication Methods in Structural Equation Modeling with Complex Sample Data

ERIC Educational Resources Information Center

Stapleton, Laura M.

2008-01-01

This article discusses replication sampling variance estimation techniques that are often applied in analyses using data from complex sampling designs: jackknife repeated replication, balanced repeated replication, and bootstrapping. These techniques are used with traditional analyses such as regression, but are currently not used with structural…
Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

PubMed

Hu, Jianhua; Wright, Fred A

2007-03-01

The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.
An improved state-parameter analysis of ecosystem models using data assimilation

USGS Publications Warehouse

Chen, M.; Liu, S.; Tieszen, L.L.; Hollinger, D.Y.

2008-01-01

Much of the effort spent in developing data assimilation methods for carbon dynamics analysis has focused on estimating optimal values for either model parameters or state variables. The main weakness of estimating parameter values alone (i.e., without considering state variables) is that all errors from input, output, and model structure are attributed to model parameter uncertainties. On the other hand, the accuracy of estimating state variables may be lowered if the temporal evolution of parameter values is not incorporated. This research develops a smoothed ensemble Kalman filter (SEnKF) by combining ensemble Kalman filter with kernel smoothing technique. SEnKF has following characteristics: (1) to estimate simultaneously the model states and parameters through concatenating unknown parameters and state variables into a joint state vector; (2) to mitigate dramatic, sudden changes of parameter values in parameter sampling and parameter evolution process, and control narrowing of parameter variance which results in filter divergence through adjusting smoothing factor in kernel smoothing algorithm; (3) to assimilate recursively data into the model and thus detect possible time variation of parameters; and (4) to address properly various sources of uncertainties stemming from input, output and parameter uncertainties. The SEnKF is tested by assimilating observed fluxes of carbon dioxide and environmental driving factor data from an AmeriFlux forest station located near Howland, Maine, USA, into a partition eddy flux model. Our analysis demonstrates that model parameters, such as light use efficiency, respiration coefficients, minimum and optimum temperatures for photosynthetic activity, and others, are highly constrained by eddy flux data at daily-to-seasonal time scales. The SEnKF stabilizes parameter values quickly regardless of the initial values of the parameters. Potential ecosystem light use efficiency demonstrates a strong seasonality. Results show that the simultaneous parameter estimation procedure significantly improves model predictions. Results also show that the SEnKF can dramatically reduce the variance in state variables stemming from the uncertainty of parameters and driving variables. The SEnKF is a robust and effective algorithm in evaluating and developing ecosystem models and in improving the understanding and quantification of carbon cycle parameters and processes. ?? 2008 Elsevier B.V.

Detecting and quantifying stellar magnetic fields. Sparse Stokes profile approximation using orthogonal matching pursuit

NASA Astrophysics Data System (ADS)

Carroll, T. A.; Strassmeier, K. G.

2014-03-01

Context. In recent years, we have seen a rapidly growing number of stellar magnetic field detections for various types of stars. Many of these magnetic fields are estimated from spectropolarimetric observations (Stokes V) by using the so-called center-of-gravity (COG) method. Unfortunately, the accuracy of this method rapidly deteriorates with increasing noise and thus calls for a more robust procedure that combines signal detection and field estimation. Aims: We introduce an estimation method that provides not only the effective or mean longitudinal magnetic field from an observed Stokes V profile but also uses the net absolute polarization of the profile to obtain an estimate of the apparent (i.e., velocity resolved) absolute longitudinal magnetic field. Methods: By combining the COG method with an orthogonal-matching-pursuit (OMP) approach, we were able to decompose observed Stokes profiles with an overcomplete dictionary of wavelet-basis functions to reliably reconstruct the observed Stokes profiles in the presence of noise. The elementary wave functions of the sparse reconstruction process were utilized to estimate the effective longitudinal magnetic field and the apparent absolute longitudinal magnetic field. A multiresolution analysis complements the OMP algorithm to provide a robust detection and estimation method. Results: An extensive Monte-Carlo simulation confirms the reliability and accuracy of the magnetic OMP approach where a mean error of under 2% is found. Its full potential is obtained for heavily noise-corrupted Stokes profiles with signal-to-noise variance ratios down to unity. In this case a conventional COG method yields a mean error for the effective longitudinal magnetic field of up to 50%, whereas the OMP method gives a maximum error of 18%. It is, moreover, shown that even in the case of very small residual noise on a level between 10-3 and 10-5, a regime reached by current multiline reconstruction techniques, the conventional COG method incorrectly interprets a large portion of the residual noise as a magnetic field, with values of up to 100 G. The magnetic OMP method, on the other hand, remains largely unaffected by the noise, regardless of the noise level the maximum error is no greater than 0.7 G.
Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

PubMed Central

Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal

2009-01-01

The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455
Approximate sample size formulas for the two-sample trimmed mean test with unequal variances.

PubMed

Luh, Wei-Ming; Guo, Jiin-Huarng

2007-05-01

Yuen's two-sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non-normality and unequal sample sizes. Given the specified alpha and the power (1-beta), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.
A frequency-domain estimator for use in adaptive control systems

NASA Technical Reports Server (NTRS)

Lamaire, Richard O.; Valavani, Lena; Athans, Michael; Stein, Gunter

1991-01-01

This paper presents a frequency-domain estimator that can identify both a parametrized nominal model of a plant as well as a frequency-domain bounding function on the modeling error associated with this nominal model. This estimator, which we call a robust estimator, can be used in conjunction with a robust control-law redesign algorithm to form a robust adaptive controller.
[Theory, method and application of method R on estimation of (co)variance components].

PubMed

Liu, Wen-Zhong

2004-07-01

Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further.
Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches

PubMed Central

Hauschild, Anne-Christin; Kopczynski, Dominik; D’Addario, Marianna; Baumbach, Jörg Ingo; Rahmann, Sven; Baumbach, Jan

2013-01-01

Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME). We manually generated a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors’ results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications. PMID:24957992
Peak detection method evaluation for ion mobility spectrometry by using machine learning approaches.

PubMed

Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna; Baumbach, Jörg Ingo; Rahmann, Sven; Baumbach, Jan

2013-04-16

Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.
On the Performance of Maximum Likelihood versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

ERIC Educational Resources Information Center

Beauducel, Andre; Herzberg, Philipp Yorck

2006-01-01

This simulation study compared maximum likelihood (ML) estimation with weighted least squares means and variance adjusted (WLSMV) estimation. The study was based on confirmatory factor analyses with 1, 2, 4, and 8 factors, based on 250, 500, 750, and 1,000 cases, and on 5, 10, 20, and 40 variables with 2, 3, 4, 5, and 6 categories. There was no…
Estimation and Simulation of Slow Crack Growth Parameters from Constant Stress Rate Data

NASA Technical Reports Server (NTRS)

Salem, Jonathan A.; Weaver, Aaron S.

2003-01-01

Closed form, approximate functions for estimating the variances and degrees-of-freedom associated with the slow crack growth parameters n, D, B, and A(sup *) as measured using constant stress rate ('dynamic fatigue') testing were derived by using propagation of errors. Estimates made with the resulting functions and slow crack growth data for a sapphire window were compared to the results of Monte Carlo simulations. The functions for estimation of the variances of the parameters were derived both with and without logarithmic transformation of the initial slow crack growth equations. The transformation was performed to make the functions both more linear and more normal. Comparison of the Monte Carlo results and the closed form expressions derived with propagation of errors indicated that linearization is not required for good estimates of the variances of parameters n and D by the propagation of errors method. However, good estimates variances of the parameters B and A(sup *) could only be made when the starting slow crack growth equation was transformed and the coefficients of variation of the input parameters were not too large. This was partially a result of the skewered distributions of B and A(sup *). Parametric variation of the input parameters was used to determine an acceptable range for using closed form approximate equations derived from propagation of errors.
A Probabilistic Mass Estimation Algorithm for a Novel 7- Channel Capacitive Sample Verification Sensor

NASA Technical Reports Server (NTRS)

Wolf, Michael

2012-01-01

A document describes an algorithm created to estimate the mass placed on a sample verification sensor (SVS) designed for lunar or planetary robotic sample return missions. A novel SVS measures the capacitance between a rigid bottom plate and an elastic top membrane in seven locations. As additional sample material (soil and/or small rocks) is placed on the top membrane, the deformation of the membrane increases the capacitance. The mass estimation algorithm addresses both the calibration of each SVS channel, and also addresses how to combine the capacitances read from each of the seven channels into a single mass estimate. The probabilistic approach combines the channels according to the variance observed during the training phase, and provides not only the mass estimate, but also a value for the certainty of the estimate. SVS capacitance data is collected for known masses under a wide variety of possible loading scenarios, though in all cases, the distribution of sample within the canister is expected to be approximately uniform. A capacitance-vs-mass curve is fitted to this data, and is subsequently used to determine the mass estimate for the single channel s capacitance reading during the measurement phase. This results in seven different mass estimates, one for each SVS channel. Moreover, the variance of the calibration data is used to place a Gaussian probability distribution function (pdf) around this mass estimate. To blend these seven estimates, the seven pdfs are combined into a single Gaussian distribution function, providing the final mean and variance of the estimate. This blending technique essentially takes the final estimate as an average of the estimates of the seven channels, weighted by the inverse of the channel s variance.
Accounting for nonsampling error in estimates of HIV epidemic trends from antenatal clinic sentinel surveillance

PubMed Central

Eaton, Jeffrey W.; Bao, Le

2017-01-01

Objectives The aim of the study was to propose and demonstrate an approach to allow additional nonsampling uncertainty about HIV prevalence measured at antenatal clinic sentinel surveillance (ANC-SS) in model-based inferences about trends in HIV incidence and prevalence. Design Mathematical model fitted to surveillance data with Bayesian inference. Methods We introduce a variance inflation parameter σinfl2 that accounts for the uncertainty of nonsampling errors in ANC-SS prevalence. It is additive to the sampling error variance. Three approaches are tested for estimating σinfl2 using ANC-SS and household survey data from 40 subnational regions in nine countries in sub-Saharan, as defined in UNAIDS 2016 estimates. Methods were compared using in-sample fit and out-of-sample prediction of ANC-SS data, fit to household survey prevalence data, and the computational implications. Results Introducing the additional variance parameter σinfl2 increased the error variance around ANC-SS prevalence observations by a median of 2.7 times (interquartile range 1.9–3.8). Using only sampling error in ANC-SS prevalence ( σinfl2=0), coverage of 95% prediction intervals was 69% in out-of-sample prediction tests. This increased to 90% after introducing the additional variance parameter σinfl2. The revised probabilistic model improved model fit to household survey prevalence and increased epidemic uncertainty intervals most during the early epidemic period before 2005. Estimating σinfl2 did not increase the computational cost of model fitting. Conclusions: We recommend estimating nonsampling error in ANC-SS as an additional parameter in Bayesian inference using the Estimation and Projection Package model. This approach may prove useful for incorporating other data sources such as routine prevalence from Prevention of mother-to-child transmission testing into future epidemic estimates. PMID:28296801
Simultaneous estimation of cross-validation errors in least squares collocation applied for statistical testing and evaluation of the noise variance components

NASA Astrophysics Data System (ADS)

Behnabian, Behzad; Mashhadi Hossainali, Masoud; Malekzadeh, Ahad

2018-02-01

The cross-validation technique is a popular method to assess and improve the quality of prediction by least squares collocation (LSC). We present a formula for direct estimation of the vector of cross-validation errors (CVEs) in LSC which is much faster than element-wise CVE computation. We show that a quadratic form of CVEs follows Chi-squared distribution. Furthermore, a posteriori noise variance factor is derived by the quadratic form of CVEs. In order to detect blunders in the observations, estimated standardized CVE is proposed as the test statistic which can be applied when noise variances are known or unknown. We use LSC together with the methods proposed in this research for interpolation of crustal subsidence in the northern coast of the Gulf of Mexico. The results show that after detection and removing outliers, the root mean square (RMS) of CVEs and estimated noise standard deviation are reduced about 51 and 59%, respectively. In addition, RMS of LSC prediction error at data points and RMS of estimated noise of observations are decreased by 39 and 67%, respectively. However, RMS of LSC prediction error on a regular grid of interpolation points covering the area is only reduced about 4% which is a consequence of sparse distribution of data points for this case study. The influence of gross errors on LSC prediction results is also investigated by lower cutoff CVEs. It is indicated that after elimination of outliers, RMS of this type of errors is also reduced by 19.5% for a 5 km radius of vicinity. We propose a method using standardized CVEs for classification of dataset into three groups with presumed different noise variances. The noise variance components for each of the groups are estimated using restricted maximum-likelihood method via Fisher scoring technique. Finally, LSC assessment measures were computed for the estimated heterogeneous noise variance model and compared with those of the homogeneous model. The advantage of the proposed method is the reduction in estimated noise levels for those groups with the fewer number of noisy data points.
Characterizing nonconstant instrumental variance in emerging miniaturized analytical techniques.

PubMed

Noblitt, Scott D; Berg, Kathleen E; Cate, David M; Henry, Charles S

2016-04-07

Measurement variance is a crucial aspect of quantitative chemical analysis. Variance directly affects important analytical figures of merit, including detection limit, quantitation limit, and confidence intervals. Most reported analyses for emerging analytical techniques implicitly assume constant variance (homoskedasticity) by using unweighted regression calibrations. Despite the assumption of constant variance, it is known that most instruments exhibit heteroskedasticity, where variance changes with signal intensity. Ignoring nonconstant variance results in suboptimal calibrations, invalid uncertainty estimates, and incorrect detection limits. Three techniques where homoskedasticity is often assumed were covered in this work to evaluate if heteroskedasticity had a significant quantitative impact-naked-eye, distance-based detection using paper-based analytical devices (PADs), cathodic stripping voltammetry (CSV) with disposable carbon-ink electrode devices, and microchip electrophoresis (MCE) with conductivity detection. Despite these techniques representing a wide range of chemistries and precision, heteroskedastic behavior was confirmed for each. The general variance forms were analyzed, and recommendations for accounting for nonconstant variance discussed. Monte Carlo simulations of instrument responses were performed to quantify the benefits of weighted regression, and the sensitivity to uncertainty in the variance function was tested. Results show that heteroskedasticity should be considered during development of new techniques; even moderate uncertainty (30%) in the variance function still results in weighted regression outperforming unweighted regressions. We recommend utilizing the power model of variance because it is easy to apply, requires little additional experimentation, and produces higher-precision results and more reliable uncertainty estimates than assuming homoskedasticity. Copyright © 2016 Elsevier B.V. All rights reserved.
Estimation of base temperatures for nine weed species.

PubMed

Steinmaus, S J; Prather, T S; Holt, J S

2000-02-01

Experiments were conducted to test several methods for estimating low temperature thresholds for seed germination. Temperature responses of nine weeds common in annual agroecosystems were assessed in temperature gradient experiments. Species included summer annuals (Amaranthus albus, A. palmeri, Digitaria sanguinalis, Echinochloa crus-galli, Portulaca oleracea, and Setaria glauca), winter annuals (Hirschfeldia incana and Sonchus oleraceus), and Conyza canadensis, which is classified as a summer or winter annual. The temperature below which development ceases (Tbase) was estimated as the x-intercept of four conventional germination rate indices regressed on temperature, by repeated probit analysis, and by a mathematical approach. An overall Tbase estimate for each species was the average across indices weighted by the reciprocal of the variance associated with the estimate. Germination rates increased linearly with temperature between 15 degrees C and 30 degrees C for all species. Consistent estimates of Tbase were obtained for most species using several indices. The most statistically robust and biologically relevant method was the reciprocal time to median germination, which can also be used to estimate other biologically meaningful parameters. The mean Tbase for summer annuals (13.8 degrees C) was higher than that for winter annuals (8.3 degrees C). The two germination response characteristics, Tbase and slope (rate), influence a species' germination behaviour in the field since the germination inhibiting effects of a high Tbase may be offset by the germination promoting effects of a rapid germination response to temperature. Estimates of Tbase may be incorporated into predictive thermal time models to assist weed control practitioners in making management decisions.
Impact of the Fano Factor on Position and Energy Estimation in Scintillation Detectors.

PubMed

Bora, Vaibhav; Barrett, Harrison H; Jha, Abhinav K; Clarkson, Eric

2015-02-01

The Fano factor for an integer-valued random variable is defined as the ratio of its variance to its mean. Light from various scintillation crystals have been reported to have Fano factors from sub-Poisson (Fano factor < 1) to super-Poisson (Fano factor > 1). For a given mean, a smaller Fano factor implies a smaller variance and thus less noise. We investigated if lower noise in the scintillation light will result in better spatial and energy resolutions. The impact of Fano factor on the estimation of position of interaction and energy deposited in simple gamma-camera geometries is estimated by two methods - calculating the Cramér-Rao bound and estimating the variance of a maximum likelihood estimator. The methods are consistent with each other and indicate that when estimating the position of interaction and energy deposited by a gamma-ray photon, the Fano factor of a scintillator does not affect the spatial resolution. A smaller Fano factor results in a better energy resolution.
Analysis of categorical moderators in mixed-effects meta-analysis: Consequences of using pooled versus separate estimates of the residual between-studies variances.

PubMed

Rubio-Aparicio, María; Sánchez-Meca, Julio; López-López, José Antonio; Botella, Juan; Marín-Martínez, Fulgencio

2017-11-01

Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta-analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between-studies variance on the statistical performance of the Q B (P) and Q B (S) tests for subgroup analyses assuming a mixed-effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between-studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between-studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates. © 2017 The British Psychological Society.
Robust linear discriminant models to solve financial crisis in banking sectors

NASA Astrophysics Data System (ADS)

Lim, Yai-Fung; Yahaya, Sharipah Soaad Syed; Idris, Faoziah; Ali, Hazlina; Omar, Zurni

2014-12-01

Linear discriminant analysis (LDA) is a widely-used technique in patterns classification via an equation which will minimize the probability of misclassifying cases into their respective categories. However, the performance of classical estimators in LDA highly depends on the assumptions of normality and homoscedasticity. Several robust estimators in LDA such as Minimum Covariance Determinant (MCD), S-estimators and Minimum Volume Ellipsoid (MVE) are addressed by many authors to alleviate the problem of non-robustness of the classical estimates. In this paper, we investigate on the financial crisis of the Malaysian banking institutions using robust LDA and classical LDA methods. Our objective is to distinguish the "distress" and "non-distress" banks in Malaysia by using the LDA models. Hit ratio is used to validate the accuracy predictive of LDA models. The performance of LDA is evaluated by estimating the misclassification rate via apparent error rate. The results and comparisons show that the robust estimators provide a better performance than the classical estimators for LDA.
Robust Fault Detection Using Robust Z1 Estimation and Fuzzy Logic

NASA Technical Reports Server (NTRS)

Curry, Tramone; Collins, Emmanuel G., Jr.; Selekwa, Majura; Guo, Ten-Huei (Technical Monitor)

2001-01-01

This research considers the application of robust Z(sub 1), estimation in conjunction with fuzzy logic to robust fault detection for an aircraft fight control system. It begins with the development of robust Z(sub 1) estimators based on multiplier theory and then develops a fixed threshold approach to fault detection (FD). It then considers the use of fuzzy logic for robust residual evaluation and FD. Due to modeling errors and unmeasurable disturbances, it is difficult to distinguish between the effects of an actual fault and those caused by uncertainty and disturbance. Hence, it is the aim of a robust FD system to be sensitive to faults while remaining insensitive to uncertainty and disturbances. While fixed thresholds only allow a decision on whether a fault has or has not occurred, it is more valuable to have the residual evaluation lead to a conclusion related to the degree of, or probability of, a fault. Fuzzy logic is a viable means of determining the degree of a fault and allows the introduction of human observations that may not be incorporated in the rigorous threshold theory. Hence, fuzzy logic can provide a more reliable and informative fault detection process. Using an aircraft flight control system, the results of FD using robust Z(sub 1) estimation with a fixed threshold are demonstrated. FD that combines robust Z(sub 1) estimation and fuzzy logic is also demonstrated. It is seen that combining the robust estimator with fuzzy logic proves to be advantageous in increasing the sensitivity to smaller faults while remaining insensitive to uncertainty and disturbances.
Effect of correlated observation error on parameters, predictions, and uncertainty

USGS Publications Warehouse

Tiedeman, Claire; Green, Christopher T.

2013-01-01

Correlations among observation errors are typically omitted when calculating observation weights for model calibration by inverse methods. We explore the effects of omitting these correlations on estimates of parameters, predictions, and uncertainties. First, we develop a new analytical expression for the difference in parameter variance estimated with and without error correlations for a simple one-parameter two-observation inverse model. Results indicate that omitting error correlations from both the weight matrix and the variance calculation can either increase or decrease the parameter variance, depending on the values of error correlation (ρ) and the ratio of dimensionless scaled sensitivities (rdss). For small ρ, the difference in variance is always small, but for large ρ, the difference varies widely depending on the sign and magnitude of rdss. Next, we consider a groundwater reactive transport model of denitrification with four parameters and correlated geochemical observation errors that are computed by an error-propagation approach that is new for hydrogeologic studies. We compare parameter estimates, predictions, and uncertainties obtained with and without the error correlations. Omitting the correlations modestly to substantially changes parameter estimates, and causes both increases and decreases of parameter variances, consistent with the analytical expression. Differences in predictions for the models calibrated with and without error correlations can be greater than parameter differences when both are considered relative to their respective confidence intervals. These results indicate that including observation error correlations in weighting for nonlinear regression can have important effects on parameter estimates, predictions, and their respective uncertainties.
Estimating Variances of Horizontal Wind Fluctuations in Stable Conditions

NASA Astrophysics Data System (ADS)

Luhar, Ashok K.

2010-05-01

Information concerning the average wind speed and the variances of lateral and longitudinal wind velocity fluctuations is required by dispersion models to characterise turbulence in the atmospheric boundary layer. When the winds are weak, the scalar average wind speed and the vector average wind speed need to be clearly distinguished and both lateral and longitudinal wind velocity fluctuations assume equal importance in dispersion calculations. We examine commonly-used methods of estimating these variances from wind-speed and wind-direction statistics measured separately, for example, by a cup anemometer and a wind vane, and evaluate the implied relationship between the scalar and vector wind speeds, using measurements taken under low-wind stable conditions. We highlight several inconsistencies inherent in the existing formulations and show that the widely-used assumption that the lateral velocity variance is equal to the longitudinal velocity variance is not necessarily true. We derive improved relations for the two variances, and although data under stable stratification are considered for comparison, our analysis is applicable more generally.

Improvement of Prediction Ability for Genomic Selection of Dairy Cattle by Including Dominance Effects

PubMed Central

Sun, Chuanyu; VanRaden, Paul M.; Cole, John B.; O'Connell, Jeffrey R.

2014-01-01

Dominance may be an important source of non-additive genetic variance for many traits of dairy cattle. However, nearly all prediction models for dairy cattle have included only additive effects because of the limited number of cows with both genotypes and phenotypes. The role of dominance in the Holstein and Jersey breeds was investigated for eight traits: milk, fat, and protein yields; productive life; daughter pregnancy rate; somatic cell score; fat percent and protein percent. Additive and dominance variance components were estimated and then used to estimate additive and dominance effects of single nucleotide polymorphisms (SNPs). The predictive abilities of three models with both additive and dominance effects and a model with additive effects only were assessed using ten-fold cross-validation. One procedure estimated dominance values, and another estimated dominance deviations; calculation of the dominance relationship matrix was different for the two methods. The third approach enlarged the dataset by including cows with genotype probabilities derived using genotyped ancestors. For yield traits, dominance variance accounted for 5 and 7% of total variance for Holsteins and Jerseys, respectively; using dominance deviations resulted in smaller dominance and larger additive variance estimates. For non-yield traits, dominance variances were very small for both breeds. For yield traits, including additive and dominance effects fit the data better than including only additive effects; average correlations between estimated genetic effects and phenotypes showed that prediction accuracy increased when both effects rather than just additive effects were included. No corresponding gains in prediction ability were found for non-yield traits. Including cows with derived genotype probabilities from genotyped ancestors did not improve prediction accuracy. The largest additive effects were located on chromosome 14 near DGAT1 for yield traits for both breeds; those SNPs also showed the largest dominance effects for fat yield (both breeds) as well as for Holstein milk yield. PMID:25084281
Nonparametric evaluation of quantitative traits in population-based association studies when the genetic model is unknown.

PubMed

Konietschke, Frank; Libiger, Ondrej; Hothorn, Ludwig A

2012-01-01

Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.
Structural changes and out-of-sample prediction of realized range-based variance in the stock market

NASA Astrophysics Data System (ADS)

Gong, Xu; Lin, Boqiang

2018-03-01

This paper aims to examine the effects of structural changes on forecasting the realized range-based variance in the stock market. Considering structural changes in variance in the stock market, we develop the HAR-RRV-SC model on the basis of the HAR-RRV model. Subsequently, the HAR-RRV and HAR-RRV-SC models are used to forecast the realized range-based variance of S&P 500 Index. We find that there are many structural changes in variance in the U.S. stock market, and the period after the financial crisis contains more structural change points than the period before the financial crisis. The out-of-sample results show that the HAR-RRV-SC model significantly outperforms the HAR-BV model when they are employed to forecast the 1-day, 1-week, and 1-month realized range-based variances, which means that structural changes can improve out-of-sample prediction of realized range-based variance. The out-of-sample results remain robust across the alternative rolling fixed-window, the alternative threshold value in ICSS algorithm, and the alternative benchmark models. More importantly, we believe that considering structural changes can help improve the out-of-sample performances of most of other existing HAR-RRV-type models in addition to the models used in this paper.
Asymptotic properties of Pearson's rank-variate correlation coefficient under contaminated Gaussian model.

PubMed

Ma, Rubao; Xu, Weichao; Zhang, Yun; Ye, Zhongfu

2014-01-01

This paper investigates the robustness properties of Pearson's rank-variate correlation coefficient (PRVCC) in scenarios where one channel is corrupted by impulsive noise and the other is impulsive noise-free. As shown in our previous work, these scenarios that frequently encountered in radar and/or sonar, can be well emulated by a particular bivariate contaminated Gaussian model (CGM). Under this CGM, we establish the asymptotic closed forms of the expectation and variance of PRVCC by means of the well known Delta method. To gain a deeper understanding, we also compare PRVCC with two other classical correlation coefficients, i.e., Spearman's rho (SR) and Kendall's tau (KT), in terms of the root mean squared error (RMSE). Monte Carlo simulations not only verify our theoretical findings, but also reveal the advantage of PRVCC by an example of estimating the time delay in the particular impulsive noise environment.
Computational methods in the development of a knowledge-based system for the prediction of solid catalyst performance.

PubMed

Procelewska, Joanna; Galilea, Javier Llamas; Clerc, Frederic; Farrusseng, David; Schüth, Ferdi

2007-01-01

The objective of this work is the construction of a correlation between characteristics of heterogeneous catalysts, encoded in a descriptor vector, and their experimentally measured performances in the propene oxidation reaction. In this paper the key issue in the modeling process, namely the selection of adequate input variables, is explored. Several data-driven feature selection strategies were applied in order to obtain an estimate of the differences in variance and information content of various attributes, furthermore to compare their relative importance. Quantitative property activity relationship techniques using probabilistic neural networks have been used for the creation of various semi-empirical models. Finally, a robust classification model, assigning selected attributes of solid compounds as input to an appropriate performance class in the model reaction was obtained. It has been evident that the mathematical support for the primary attributes set proposed by chemists can be highly desirable.
New robust statistical procedures for the polytomous logistic regression models.

PubMed

Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

2018-05-17

This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Robust Alternatives to the Standard Deviation in Processing of Physics Experimental Data

NASA Astrophysics Data System (ADS)

Shulenin, V. P.

2016-10-01

Properties of robust estimations of the scale parameter are studied. It is noted that the median of absolute deviations and the modified estimation of the average Gini differences have asymptotically normal distributions and bounded influence functions, are B-robust estimations, and hence, unlike the estimation of the standard deviation, are protected from the presence of outliers in the sample. Results of comparison of estimations of the scale parameter are given for a Gaussian model with contamination. An adaptive variant of the modified estimation of the average Gini differences is considered.
Comparing Bayesian estimates of genetic differentiation of molecular markers and quantitative traits: an application to Pinus sylvestris.

PubMed

Waldmann, P; García-Gil, M R; Sillanpää, M J

2005-06-01

Comparison of the level of differentiation at neutral molecular markers (estimated as F(ST) or G(ST)) with the level of differentiation at quantitative traits (estimated as Q(ST)) has become a standard tool for inferring that there is differential selection between populations. We estimated Q(ST) of timing of bud set from a latitudinal cline of Pinus sylvestris with a Bayesian hierarchical variance component method utilizing the information on the pre-estimated population structure from neutral molecular markers. Unfortunately, the between-family variances differed substantially between populations that resulted in a bimodal posterior of Q(ST) that could not be compared in any sensible way with the unimodal posterior of the microsatellite F(ST). In order to avoid publishing studies with flawed Q(ST) estimates, we recommend that future studies should present heritability estimates for each trait and population. Moreover, to detect variance heterogeneity in frequentist methods (ANOVA and REML), it is of essential importance to check also that the residuals are normally distributed and do not follow any systematically deviating trends.
Kriging analysis of mean annual precipitation, Powder River Basin, Montana and Wyoming

USGS Publications Warehouse

Karlinger, M.R.; Skrivan, James A.

1981-01-01

Kriging is a statistical estimation technique for regionalized variables which exhibit an autocorrelation structure. Such structure can be described by a semi-variogram of the observed data. The kriging estimate at any point is a weighted average of the data, where the weights are determined using the semi-variogram and an assumed drift, or lack of drift, in the data. Block, or areal, estimates can also be calculated. The kriging algorithm, based on unbiased and minimum-variance estimates, involves a linear system of equations to calculate the weights. Kriging variances can then be used to give confidence intervals of the resulting estimates. Mean annual precipitation in the Powder River basin, Montana and Wyoming, is an important variable when considering restoration of coal-strip-mining lands of the region. Two kriging analyses involving data at 60 stations were made--one assuming no drift in precipitation, and one a partial quadratic drift simulating orographic effects. Contour maps of estimates of mean annual precipitation were similar for both analyses, as were the corresponding contours of kriging variances. Block estimates of mean annual precipitation were made for two subbasins. Runoff estimates were 1-2 percent of the kriged block estimates. (USGS)
Use of weaning management group as a random effect for a more robust estimation of genetic parameters for post-weaning traits in Nellore cattle.

PubMed

Pedrosa, V B; Eler, J P; Ferraz, J B S; Groeneveld, E

2014-02-21

Data from 69,525 animals were used to compare two types of analyses, one of them having the weaning management group (WEMANG) included as an effect in the contemporary group (F_WEMANG) and the other considering the weaning management group as a random effect, not related to the mathematical model (R_WEMANG) for post-weaning traits. The components of (co)variance were estimated for pre-weaning traits (birth weight and weaning weight) and for post-weaning traits [scrotal circumference (SC), weight gain from weaning to 18 months of age (WG) and muscle score (MUSC)] in Nellore cattle, based on a complete animal model. Heritability of SC, WG and MUSC for the F_WEMANG model was equal to 0.46 ± 0.02, 0.38 ± 0.03 and 0.26 ± 0.01, and for the R_WEMANG model it was 0.45 ± 0.02, 0.31 ± 0.03 and 0.25 ± 0.01, respectively. Genetic correlations between all the studied traits varied between 0.07 ± 0.01 and 0.77 ± 0.03 in F_WEMANG and between 0.02 ± 0.01 and 0.76 ± 0.04 in R_WEMANG. The R_ WEMANG model allowed a decrease in the number of contemporary groups as well as an increase in the number of observations per group without significant alterations in heritability coefficients, for the post-weaning traits. Consequently, the analysis became more robust and avoided having contemporary groups with low variability.
Previous Estimates of Mitochondrial DNA Mutation Level Variance Did Not Account for Sampling Error: Comparing the mtDNA Genetic Bottleneck in Mice and Humans

PubMed Central

Wonnapinij, Passorn; Chinnery, Patrick F.; Samuels, David C.

2010-01-01

In cases of inherited pathogenic mitochondrial DNA (mtDNA) mutations, a mother and her offspring generally have large and seemingly random differences in the amount of mutated mtDNA that they carry. Comparisons of measured mtDNA mutation level variance values have become an important issue in determining the mechanisms that cause these large random shifts in mutation level. These variance measurements have been made with samples of quite modest size, which should be a source of concern because higher-order statistics, such as variance, are poorly estimated from small sample sizes. We have developed an analysis of the standard error of variance from a sample of size n, and we have defined error bars for variance measurements based on this standard error. We calculate variance error bars for several published sets of measurements of mtDNA mutation level variance and show how the addition of the error bars alters the interpretation of these experimental results. We compare variance measurements from human clinical data and from mouse models and show that the mutation level variance is clearly higher in the human data than it is in the mouse models at both the primary oocyte and offspring stages of inheritance. We discuss how the standard error of variance can be used in the design of experiments measuring mtDNA mutation level variance. Our results show that variance measurements based on fewer than 20 measurements are generally unreliable and ideally more than 50 measurements are required to reliably compare variances with less than a 2-fold difference. PMID:20362273
Robust location and spread measures for nonparametric probability density function estimation.

PubMed

López-Rubio, Ezequiel

2009-10-01

Robustness against outliers is a desirable property of any unsupervised learning scheme. In particular, probability density estimators benefit from incorporating this feature. A possible strategy to achieve this goal is to substitute the sample mean and the sample covariance matrix by more robust location and spread estimators. Here we use the L1-median to develop a nonparametric probability density function (PDF) estimator. We prove its most relevant properties, and we show its performance in density estimation and classification applications.
Applied Prevalence Ratio estimation with different Regression models: An example from a cross-national study on substance use research.

PubMed

Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina

2016-06-14

To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.
Premature death of adult adoptees: analyses of a case-cohort sample.

PubMed

Petersen, Liselotte; Andersen, Per Kragh; Sørensen, Thorkild I A

2005-05-01

Genetic and environmental influence on risk of premature death in adulthood was investigated by estimating the associations in total and cause-specific mortality of adult Danish adoptees and their biological and adoptive parents. Among all 14,425 non-familial adoptions formally granted in Denmark during the period 1924 through 1947, we selected the study population according to a case-cohort sampling design. As the case-control design, the case-cohort design has the advantage of economic data collection and little loss in statistical efficiency, but the case-cohort sample has the additional advantages that rate ratio estimates may be obtained, and re-use of the cohort sample in future studies of other outcomes is possible. Analyses were performed using Kalbfleisch and Lawless's estimator for hazard ratio, and robust estimation for variances. In the main analyses the sample was restricted to birth years of the adoptees 1924 and after, and age of transfer to the adoptive parents before 7 years, and age at death was restricted to 16 to 70 years. The results showed a higher mortality among adoptees, whose biological parents died in the age range of 16 to 70 years; this was significant for deaths from natural causes, vascular causes and all causes. No influence was seen from early death of adoptive parents, regardless of cause of death. (c) 2005 Wiley-Liss, Inc.
Variances and uncertainties of the sample laboratory-to-laboratory variance (S(L)2) and standard deviation (S(L)) associated with an interlaboratory study.

PubMed

McClure, Foster D; Lee, Jung K

2012-01-01

The validation process for an analytical method usually employs an interlaboratory study conducted as a balanced completely randomized model involving a specified number of randomly chosen laboratories, each analyzing a specified number of randomly allocated replicates. For such studies, formulas to obtain approximate unbiased estimates of the variance and uncertainty of the sample laboratory-to-laboratory (lab-to-lab) STD (S(L)) have been developed primarily to account for the uncertainty of S(L) when there is a need to develop an uncertainty budget that includes the uncertainty of S(L). For the sake of completeness on this topic, formulas to estimate the variance and uncertainty of the sample lab-to-lab variance (S(L)2) were also developed. In some cases, it was necessary to derive the formulas based on an approximate distribution for S(L)2.
Posed versus spontaneous facial expressions are modulated by opposite cerebral hemispheres.

PubMed

Ross, Elliott D; Pulusu, Vinay K

2013-05-01

Clinical research has indicated that the left face is more expressive than the right face, suggesting that modulation of facial expressions is lateralized to the right hemisphere. The findings, however, are controversial because the results explain, on average, approximately 4% of the data variance. Using high-speed videography, we sought to determine if movement-onset asymmetry was a more powerful research paradigm than terminal movement asymmetry. The results were very robust, explaining up to 70% of the data variance. Posed expressions began overwhelmingly on the right face whereas spontaneous expressions began overwhelmingly on the left face. This dichotomy was most robust for upper facial expressions. In addition, movement-onset asymmetries did not predict terminal movement asymmetries, which were not significantly lateralized. The results support recent neuroanatomic observations that upper versus lower facial movements have different forebrain motor representations and recent behavioral constructs that posed versus spontaneous facial expressions are modulated preferentially by opposite cerebral hemispheres and that spontaneous facial expressions are graded rather than non-graded movements. Published by Elsevier Ltd.
Software for the grouped optimal aggregation technique

NASA Technical Reports Server (NTRS)

Brown, P. M.; Shaw, G. W. (Principal Investigator)

1982-01-01

The grouped optimal aggregation technique produces minimum variance, unbiased estimates of acreage and production for countries, zones (states), or any designated collection of acreage strata. It uses yield predictions, historical acreage information, and direct acreage estimate from satellite data. The acreage strata are grouped in such a way that the ratio model over historical acreage provides a smaller variance than if the model were applied to each individual stratum. An optimal weighting matrix based on historical acreages, provides the link between incomplete direct acreage estimates and the total, current acreage estimate.
Statistics of some atmospheric turbulence records relevant to aircraft response calculations

NASA Technical Reports Server (NTRS)

Mark, W. D.; Fischer, R. W.

1981-01-01

Methods for characterizing atmospheric turbulence are described. The methods illustrated include maximum likelihood estimation of the integral scale and intensity of records obeying the von Karman transverse power spectral form, constrained least-squares estimation of the parameters of a parametric representation of autocorrelation functions, estimation of the power spectra density of the instantaneous variance of a record with temporally fluctuating variance, and estimation of the probability density functions of various turbulence components. Descriptions of the computer programs used in the computations are given, and a full listing of these programs is included.
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Use of a threshold animal model to estimate calving ease and stillbirth (co)variance components for US Holsteins

USDA-ARS?s Scientific Manuscript database

(Co)variance components for calving ease and stillbirth in US Holsteins were estimated using a single-trait threshold animal model and two different sets of data edits. Six sets of approximately 250,000 records each were created by randomly selecting herd codes without replacement from the data used...

Latent transition models with latent class predictors: attention deficit hyperactivity disorder subtypes and high school marijuana use

PubMed Central

Reboussin, Beth A.; Ialongo, Nicholas S.

2011-01-01

Summary Attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder which is most often diagnosed in childhood with symptoms often persisting into adulthood. Elevated rates of substance use disorders have been evidenced among those with ADHD, but recent research focusing on the relationship between subtypes of ADHD and specific drugs is inconsistent. We propose a latent transition model (LTM) to guide our understanding of how drug use progresses, in particular marijuana use, while accounting for the measurement error that is often found in self-reported substance use data. We extend the LTM to include a latent class predictor to represent empirically derived ADHD subtypes that do not rely on meeting specific diagnostic criteria. We begin by fitting two separate latent class analysis (LCA) models by using second-order estimating equations: a longitudinal LCA model to define stages of marijuana use, and a cross-sectional LCA model to define ADHD subtypes. The LTM model parameters describing the probability of transitioning between the LCA-defined stages of marijuana use and the influence of the LCA-defined ADHD subtypes on these transition rates are then estimated by using a set of first-order estimating equations given the LCA parameter estimates. A robust estimate of the LTM parameter variance that accounts for the variation due to the estimation of the two sets of LCA parameters is proposed. Solving three sets of estimating equations enables us to determine the underlying latent class structures independently of the model for the transition rates and simplifying assumptions about the correlation structure at each stage reduces the computational complexity. PMID:21461139
Estimating scaled treatment effects with multiple outcomes.

PubMed

Kennedy, Edward H; Kangovi, Shreya; Mitra, Nandita

2017-01-01

In classical study designs, the aim is often to learn about the effects of a treatment or intervention on a single outcome; in many modern studies, however, data on multiple outcomes are collected and it is of interest to explore effects on multiple outcomes simultaneously. Such designs can be particularly useful in patient-centered research, where different outcomes might be more or less important to different patients. In this paper, we propose scaled effect measures (via potential outcomes) that translate effects on multiple outcomes to a common scale, using mean-variance and median-interquartile range based standardizations. We present efficient, nonparametric, doubly robust methods for estimating these scaled effects (and weighted average summary measures), and for testing the null hypothesis that treatment affects all outcomes equally. We also discuss methods for exploring how treatment effects depend on covariates (i.e., effect modification). In addition to describing efficiency theory for our estimands and the asymptotic behavior of our estimators, we illustrate the methods in a simulation study and a data analysis. Importantly, and in contrast to much of the literature concerning effects on multiple outcomes, our methods are nonparametric and can be used not only in randomized trials to yield increased efficiency, but also in observational studies with high-dimensional covariates to reduce confounding bias.
A stochastic approach to quantifying the blur with uncertainty estimation for high-energy X-ray imaging systems

DOE PAGES

Fowler, Michael J.; Howard, Marylesa; Luttman, Aaron; ...

2015-06-03

One of the primary causes of blur in a high-energy X-ray imaging system is the shape and extent of the radiation source, or ‘spot’. It is important to be able to quantify the size of the spot as it provides a lower bound on the recoverable resolution for a radiograph, and penumbral imaging methods – which involve the analysis of blur caused by a structured aperture – can be used to obtain the spot’s spatial profile. We present a Bayesian approach for estimating the spot shape that, unlike variational methods, is robust to the initial choice of parameters. The posteriormore » is obtained from a normal likelihood, which was constructed from a weighted least squares approximation to a Poisson noise model, and prior assumptions that enforce both smoothness and non-negativity constraints. A Markov chain Monte Carlo algorithm is used to obtain samples from the target posterior, and the reconstruction and uncertainty estimates are the computed mean and variance of the samples, respectively. Lastly, synthetic data-sets are used to demonstrate accurate reconstruction, while real data taken with high-energy X-ray imaging systems are used to demonstrate applicability and feasibility.« less
Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

PubMed

Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

2014-06-26

To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
A new method to compare statistical tree growth curves: the PL-GMANOVA model and its application with dendrochronological data.

PubMed

Ricker, Martin; Peña Ramírez, Víctor M; von Rosen, Dietrich

2014-01-01

Growth curves are monotonically increasing functions that measure repeatedly the same subjects over time. The classical growth curve model in the statistical literature is the Generalized Multivariate Analysis of Variance (GMANOVA) model. In order to model the tree trunk radius (r) over time (t) of trees on different sites, GMANOVA is combined here with the adapted PL regression model Q = A · T+E, where for b ≠ 0 : Q = Ei[-b · r]-Ei[-b · r1] and for b = 0 : Q = Ln[r/r1], A = initial relative growth to be estimated, T = t-t1, and E is an error term for each tree and time point. Furthermore, Ei[-b · r] = ∫(Exp[-b · r]/r)dr, b = -1/TPR, with TPR being the turning point radius in a sigmoid curve, and r1 at t1 is an estimated calibrating time-radius point. Advantages of the approach are that growth rates can be compared among growth curves with different turning point radiuses and different starting points, hidden outliers are easily detectable, the method is statistically robust, and heteroscedasticity of the residuals among time points is allowed. The model was implemented with dendrochronological data of 235 Pinus montezumae trees on ten Mexican volcano sites to calculate comparison intervals for the estimated initial relative growth A. One site (at the Popocatépetl volcano) stood out, with A being 3.9 times the value of the site with the slowest-growing trees. Calculating variance components for the initial relative growth, 34% of the growth variation was found among sites, 31% among trees, and 35% over time. Without the Popocatépetl site, the numbers changed to 7%, 42%, and 51%. Further explanation of differences in growth would need to focus on factors that vary within sites and over time.
Some New Results on Grubbs’ Estimators.

DTIC Science & Technology

1983-06-01

8217 ESTIMATORS DENNIS A. BRINDLEY AND RALPH A. BRADLEY* Consider a two-way classification with n rows and r columns and the usual model of analysis of variance...except that the error components of the model may have heterogeneous variances, by columns. -Grubbs provided unbiased estimators Q. of a . that depend...of observations yij, i = 1, ... , n, j 1, ... , r, and the model , Yij = Ili + ij + Ej, (1) when Vi represents the mean response of row i, . represents
Terrain Classification on Venus from Maximum-Likelihood Inversion of Parameterized Models of Topography, Gravity, and their Relation

NASA Astrophysics Data System (ADS)

Eggers, G. L.; Lewis, K. W.; Simons, F. J.; Olhede, S.

2013-12-01

Venus does not possess a plate-tectonic system like that observed on Earth, and many surface features--such as tesserae and coronae--lack terrestrial equivalents. To understand Venus' tectonics is to understand its lithosphere, requiring a study of topography and gravity, and how they relate. Past studies of topography dealt with mapping and classification of visually observed features, and studies of gravity dealt with inverting the relation between topography and gravity anomalies to recover surface density and elastic thickness in either the space (correlation) or the spectral (admittance, coherence) domain. In the former case, geological features could be delineated but not classified quantitatively. In the latter case, rectangular or circular data windows were used, lacking geological definition. While the estimates of lithospheric strength on this basis were quantitative, they lacked robust error estimates. Here, we remapped the surface into 77 regions visually and qualitatively defined from a combination of Magellan topography, gravity, and radar images. We parameterize the spectral covariance of the observed topography, treating it as a Gaussian process assumed to be stationary over the mapped regions, using a three-parameter isotropic Matern model, and perform maximum-likelihood based inversions for the parameters. We discuss the parameter distribution across the Venusian surface and across terrain types such as coronoae, dorsae, tesserae, and their relation with mean elevation and latitudinal position. We find that the three-parameter model, while mathematically established and applicable to Venus topography, is overparameterized, and thus reduce the results to a two-parameter description of the peak spectral variance and the range-to-half-peak variance (in function of the wavenumber). With the reduction the clustering of geological region types in two-parameter space becomes promising. Finally, we perform inversions for the JOINT spectral variance of topography and gravity, in which the INITIAL loading by topography retains the Matern form but the FINAL topography and gravity are the result of flexural compensation. In our modeling, we pay explicit attention to finite-field spectral estimation effects (and their remedy via tapering), and to the implementation of statistical tests (for anisotropy, for initial-loading process correlation, to ascertain the proper density contrasts and interface depth in a two-layer model), robustness assessment and uncertainty quantification, as well as to algorithmic intricacies related to low-dimensional but poorly scaled maximum-likelihood inversions. We conclude that Venusian geomorphic terrains are well described by their 2-D topographic and gravity (cross-)power spectra, and the spectral properties of distinct geologic provinces on Venus are worth quantifying via maximum-likelihood-based methods under idealized three-parameter Matern distributions. Analysis of fitted parameters and the fitted-data residuals reveals natural variability in the (sub)surface properties on Venus, as well as some directional anisotropy. Geologic regions tend to cluster according to terrain type in our parameter space, which we analyze to confirm their shared geologic histories and utilize for guidance in ongoing mapping efforts of Venus and other terrestrial bodies.
Direct and Absolute Quantification of over 1800 Yeast Proteins via Selected Reaction Monitoring*

PubMed Central

Lawless, Craig; Holman, Stephen W.; Brownridge, Philip; Lanthaler, Karin; Harman, Victoria M.; Watkins, Rachel; Hammond, Dean E.; Miller, Rebecca L.; Sims, Paul F. G.; Grant, Christopher M.; Eyers, Claire E.; Beynon, Robert J.

2016-01-01

Defining intracellular protein concentration is critical in molecular systems biology. Although strategies for determining relative protein changes are available, defining robust absolute values in copies per cell has proven significantly more challenging. Here we present a reference data set quantifying over 1800 Saccharomyces cerevisiae proteins by direct means using protein-specific stable-isotope labeled internal standards and selected reaction monitoring (SRM) mass spectrometry, far exceeding any previous study. This was achieved by careful design of over 100 QconCAT recombinant proteins as standards, defining 1167 proteins in terms of copies per cell and upper limits on a further 668, with robust CVs routinely less than 20%. The selected reaction monitoring-derived proteome is compared with existing quantitative data sets, highlighting the disparities between methodologies. Coupled with a quantification of the transcriptome by RNA-seq taken from the same cells, these data support revised estimates of several fundamental molecular parameters: a total protein count of ∼100 million molecules-per-cell, a median of ∼1000 proteins-per-transcript, and a linear model of protein translation explaining 70% of the variance in translation rate. This work contributes a “gold-standard” reference yeast proteome (including 532 values based on high quality, dual peptide quantification) that can be widely used in systems models and for other comparative studies. PMID:26750110
Adjustment of Measurements with Multiplicative Errors: Error Analysis, Estimates of the Variance of Unit Weight, and Effect on Volume Estimation from LiDAR-Type Digital Elevation Models

PubMed Central

Shi, Yun; Xu, Peiliang; Peng, Junhuan; Shi, Chuang; Liu, Jingnan

2014-01-01

Modern observation technology has verified that measurement errors can be proportional to the true values of measurements such as GPS, VLBI baselines and LiDAR. Observational models of this type are called multiplicative error models. This paper is to extend the work of Xu and Shimada published in 2000 on multiplicative error models to analytical error analysis of quantities of practical interest and estimates of the variance of unit weight. We analytically derive the variance-covariance matrices of the three least squares (LS) adjustments, the adjusted measurements and the corrections of measurements in multiplicative error models. For quality evaluation, we construct five estimators for the variance of unit weight in association of the three LS adjustment methods. Although LiDAR measurements are contaminated with multiplicative random errors, LiDAR-based digital elevation models (DEM) have been constructed as if they were of additive random errors. We will simulate a model landslide, which is assumed to be surveyed with LiDAR, and investigate the effect of LiDAR-type multiplicative error measurements on DEM construction and its effect on the estimate of landslide mass volume from the constructed DEM. PMID:24434880
Uncertainty in Population Estimates for Endangered Animals and Improving the Recovery Process

PubMed Central

Haines, Aaron M.; Zak, Matthew; Hammond, Katie; Scott, J. Michael; Goble, Dale D.; Rachlow, Janet L.

2013-01-01

Simple Summary The objective of our study was to evaluate the mention of uncertainty (i.e., variance) associated with population size estimates within U.S. recovery plans for endangered animals. To do this we reviewed all finalized recovery plans for listed terrestrial vertebrate species. We found that more recent recovery plans reported more estimates of population size and uncertainty. Also, bird and mammal recovery plans reported more estimates of population size and uncertainty. We recommend that updated recovery plans combine uncertainty of population size estimates with a minimum detectable difference to aid in successful recovery. Abstract United States recovery plans contain biological information for a species listed under the Endangered Species Act and specify recovery criteria to provide basis for species recovery. The objective of our study was to evaluate whether recovery plans provide uncertainty (e.g., variance) with estimates of population size. We reviewed all finalized recovery plans for listed terrestrial vertebrate species to record the following data: (1) if a current population size was given, (2) if a measure of uncertainty or variance was associated with current estimates of population size and (3) if population size was stipulated for recovery. We found that 59% of completed recovery plans specified a current population size, 14.5% specified a variance for the current population size estimate and 43% specified population size as a recovery criterion. More recent recovery plans reported more estimates of current population size, uncertainty and population size as a recovery criterion. Also, bird and mammal recovery plans reported more estimates of population size and uncertainty compared to reptiles and amphibians. We suggest the use of calculating minimum detectable differences to improve confidence when delisting endangered animals and we identified incentives for individuals to get involved in recovery planning to improve access to quantitative data. PMID:26479531
Combining the Hanning windowed interpolated FFT in both directions

NASA Astrophysics Data System (ADS)

Chen, Kui Fu; Li, Yan Feng

2008-06-01

The interpolated fast Fourier transform (IFFT) has been proposed as a way to eliminate the picket fence effect (PFE) of the fast Fourier transform. The modulus based IFFT, cited in most relevant references, makes use of only the 1st and 2nd highest spectral lines. An approach using three principal spectral lines is proposed. This new approach combines both directions of the complex spectrum based IFFT with the Hanning window. The optimal weight to minimize the estimation variance is established on the first order Taylor series expansion of noise interference. A numerical simulation is carried out, and the results are compared with the Cramer-Rao bound. It is demonstrated that the proposed approach has a lower estimation variance than the two-spectral-line approach. The improvement depends on the extent of sampling deviating from the coherent condition, and the best is decreasing variance by 2/7. However, it is also shown that the estimation variance of the windowed IFFT with the Hanning is significantly higher than that of without windowing.
Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

NASA Astrophysics Data System (ADS)

Reis, D. S.; Stedinger, J. R.; Martins, E. S.

2005-10-01

This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
A regularized auxiliary particle filtering approach for system state estimation and battery life prediction

NASA Astrophysics Data System (ADS)

Liu, Jie; Wang, Wilson; Ma, Fai

2011-07-01

System current state estimation (or condition monitoring) and future state prediction (or failure prognostics) constitute the core elements of condition-based maintenance programs. For complex systems whose internal state variables are either inaccessible to sensors or hard to measure under normal operational conditions, inference has to be made from indirect measurements using approaches such as Bayesian learning. In recent years, the auxiliary particle filter (APF) has gained popularity in Bayesian state estimation; the APF technique, however, has some potential limitations in real-world applications. For example, the diversity of the particles may deteriorate when the process noise is small, and the variance of the importance weights could become extremely large when the likelihood varies dramatically over the prior. To tackle these problems, a regularized auxiliary particle filter (RAPF) is developed in this paper for system state estimation and forecasting. This RAPF aims to improve the performance of the APF through two innovative steps: (1) regularize the approximating empirical density and redraw samples from a continuous distribution so as to diversify the particles; and (2) smooth out the rather diffused proposals by a rejection/resampling approach so as to improve the robustness of particle filtering. The effectiveness of the proposed RAPF technique is evaluated through simulations of a nonlinear/non-Gaussian benchmark model for state estimation. It is also implemented for a real application in the remaining useful life (RUL) prediction of lithium-ion batteries.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

USGS Publications Warehouse

Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

2016-01-01

Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

NASA Astrophysics Data System (ADS)

Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

2016-12-01

Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Importance sampling variance reduction for the Fokker–Planck rarefied gas particle method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collyer, B.S., E-mail: benjamin.collyer@gmail.com; London Mathematical Laboratory, 14 Buckingham Street, London WC2N 6DF; Connaughton, C.

The Fokker–Planck approximation to the Boltzmann equation, solved numerically by stochastic particle schemes, is used to provide estimates for rarefied gas flows. This paper presents a variance reduction technique for a stochastic particle method that is able to greatly reduce the uncertainty of the estimated flow fields when the characteristic speed of the flow is small in comparison to the thermal velocity of the gas. The method relies on importance sampling, requiring minimal changes to the basic stochastic particle scheme. We test the importance sampling scheme on a homogeneous relaxation, planar Couette flow and a lid-driven-cavity flow, and find thatmore » our method is able to greatly reduce the noise of estimated quantities. Significantly, we find that as the characteristic speed of the flow decreases, the variance of the noisy estimators becomes independent of the characteristic speed.« less
Multilevel models for estimating incremental net benefits in multinational studies.

PubMed

Grieve, Richard; Nixon, Richard; Thompson, Simon G; Cairns, John

2007-08-01

Multilevel models (MLMs) have been recommended for estimating incremental net benefits (INBs) in multicentre cost-effectiveness analysis (CEA). However, these models have assumed that the INBs are exchangeable and that there is a common variance across all centres. This paper examines the plausibility of these assumptions by comparing various MLMs for estimating the mean INB in a multinational CEA. The results showed that the MLMs that assumed the INBs were exchangeable and had a common variance led to incorrect inferences. The MLMs that included covariates to allow for systematic differences across the centres, and estimated different variances in each centre, made more plausible assumptions, fitted the data better and led to more appropriate inferences. We conclude that the validity of assumptions underlying MLMs used in CEA need to be critically evaluated before reliable conclusions can be drawn. Copyright 2006 John Wiley & Sons, Ltd.
RESPONDENT-DRIVEN SAMPLING AS MARKOV CHAIN MONTE CARLO

PubMed Central

GOEL, SHARAD; SALGANIK, MATTHEW J.

2013-01-01

Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present respondent-driven sampling as Markov chain Monte Carlo (MCMC) importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating respondent-driven sampling studies. PMID:19572381
Respondent-driven sampling as Markov chain Monte Carlo.

PubMed

Goel, Sharad; Salganik, Matthew J

2009-07-30

Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present RDS as Markov chain Monte Carlo importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which the sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating RDS studies.
Endogenous pain modulation in chronic orofacial pain: a systematic review and meta-analysis.

PubMed

Moana-Filho, Estephan J; Herrero Babiloni, Alberto; Theis-Mahon, Nicole R

2018-06-15

Abnormal endogenous pain modulation was suggested as a potential mechanism for chronic pain, ie, increased pain facilitation and/or impaired pain inhibition underlying symptoms manifestation. Endogenous pain modulation function can be tested using psychophysical methods such as temporal summation of pain (TSP) and conditioned pain modulation (CPM), which assess pain facilitation and inhibition, respectively. Several studies have investigated endogenous pain modulation function in patients with nonparoxysmal orofacial pain (OFP) and reported mixed results. This study aimed to provide, through a qualitative and quantitative synthesis of the available literature, overall estimates for TSP/CPM responses in patients with OFP relative to controls. MEDLINE, Embase, and the Cochrane databases were searched, and references were screened independently by 2 raters. Twenty-six studies were included for qualitative review, and 22 studies were included for meta-analysis. Traditional meta-analysis and robust variance estimation were used to synthesize overall estimates for standardized mean difference. The overall standardized estimate for TSP was 0.30 (95% confidence interval: 0.11-0.49; P = 0.002), with moderate between-study heterogeneity (Q [df = 17] = 41.8, P = 0.001; I = 70.2%). Conditioned pain modulation's estimated overall effect size was large but above the significance threshold (estimate = 1.36; 95% confidence interval: -0.09 to 2.81; P = 0.066), with very large heterogeneity (Q [df = 8] = 108.3, P < 0.001; I = 98.0%). Sensitivity analyses did not affect the overall estimate for TSP; for CPM, the overall estimate became significant if specific random-effect models were used or if the most influential study was removed. Publication bias was not present for TSP studies, whereas it substantially influenced CPM's overall estimate. These results suggest increased pain facilitation and trend for pain inhibition impairment in patients with nonparoxysmal OFP.

Impact of reduced marker set estimation of genomic relationship matrices on genomic selection for feed efficiency in Angus cattle.

PubMed

Rolf, Megan M; Taylor, Jeremy F; Schnabel, Robert D; McKay, Stephanie D; McClure, Matthew C; Northcutt, Sally L; Kerley, Monty S; Weaber, Robert L

2010-04-19

Molecular estimates of breeding value are expected to increase selection response due to improvements in the accuracy of selection and a reduction in generation interval, particularly for traits that are difficult or expensive to record or are measured late in life. Several statistical methods for incorporating molecular data into breeding value estimation have been proposed, however, most studies have utilized simulated data in which the generated linkage disequilibrium may not represent the targeted livestock population. A genomic relationship matrix was developed for 698 Angus steers and 1,707 Angus sires using 41,028 single nucleotide polymorphisms and breeding values were estimated using feed efficiency phenotypes (average daily feed intake, residual feed intake, and average daily gain) recorded on the steers. The number of SNPs needed to accurately estimate a genomic relationship matrix was evaluated in this population. Results were compared to estimates produced from pedigree-based mixed model analysis of 862 Angus steers with 34,864 identified paternal relatives but no female ancestors. Estimates of additive genetic variance and breeding value accuracies were similar for AFI and RFI using the numerator and genomic relationship matrices despite fewer animals in the genomic analysis. Bootstrap analyses indicated that 2,500-10,000 markers are required for robust estimation of genomic relationship matrices in cattle. This research shows that breeding values and their accuracies may be estimated for commercially important sires for traits recorded in experimental populations without the need for pedigree data to establish identity by descent between members of the commercial and experimental populations when at least 2,500 SNPs are available for the generation of a genomic relationship matrix.
Dose-Dependent Effects of Statins for Patients with Aneurysmal Subarachnoid Hemorrhage: Meta-Regression Analysis.

PubMed

To, Minh-Son; Prakash, Shivesh; Poonnoose, Santosh I; Bihari, Shailesh

2018-05-01

The study uses meta-regression analysis to quantify the dose-dependent effects of statin pharmacotherapy on vasospasm, delayed ischemic neurologic deficits (DIND), and mortality in aneurysmal subarachnoid hemorrhage. Prospective, retrospective observational studies, and randomized controlled trials (RCTs) were retrieved by a systematic database search. Summary estimates were expressed as absolute risk (AR) for a given statin dose or control (placebo). Meta-regression using inverse variance weighting and robust variance estimation was performed to assess the effect of statin dose on transformed AR in a random effects model. Dose-dependence of predicted AR with 95% confidence interval (CI) was recovered by using Miller's Freeman-Tukey inverse. The database search and study selection criteria yielded 18 studies (2594 patients) for analysis. These included 12 RCTs, 4 retrospective observational studies, and 2 prospective observational studies. Twelve studies investigated simvastatin, whereas the remaining studies investigated atorvastatin, pravastatin, or pitavastatin, with simvastatin-equivalent doses ranging from 20 to 80 mg. Meta-regression revealed dose-dependent reductions in Freeman-Tukey-transformed AR of vasospasm (slope coefficient -0.00404, 95% CI -0.00720 to -0.00087; P = 0.0321), DIND (slope coefficient -0.00316, 95% CI -0.00586 to -0.00047; P = 0.0392), and mortality (slope coefficient -0.00345, 95% CI -0.00623 to -0.00067; P = 0.0352). The present meta-regression provides weak evidence for dose-dependent reductions in vasospasm, DIND and mortality associated with acute statin use after aneurysmal subarachnoid hemorrhage. However, the analysis was limited by substantial heterogeneity among individual studies. Greater dosing strategies are a potential consideration for future RCTs. Copyright © 2018 Elsevier Inc. All rights reserved.
Method for hyperspectral imagery exploitation and pixel spectral unmixing

NASA Technical Reports Server (NTRS)

Lin, Ching-Fang (Inventor)

2003-01-01

An efficiently hybrid approach to exploit hyperspectral imagery and unmix spectral pixels. This hybrid approach uses a genetic algorithm to solve the abundance vector for the first pixel of a hyperspectral image cube. This abundance vector is used as initial state in a robust filter to derive the abundance estimate for the next pixel. By using Kalman filter, the abundance estimate for a pixel can be obtained in one iteration procedure which is much fast than genetic algorithm. The output of the robust filter is fed to genetic algorithm again to derive accurate abundance estimate for the current pixel. The using of robust filter solution as starting point of the genetic algorithm speeds up the evolution of the genetic algorithm. After obtaining the accurate abundance estimate, the procedure goes to next pixel, and uses the output of genetic algorithm as the previous state estimate to derive abundance estimate for this pixel using robust filter. And again use the genetic algorithm to derive accurate abundance estimate efficiently based on the robust filter solution. This iteration continues until pixels in a hyperspectral image cube end.
Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise

NASA Astrophysics Data System (ADS)

Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.

2004-05-01

A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.
Does an uneven sample size distribution across settings matter in cross-classified multilevel modeling? Results of a simulation study.

PubMed

Milliren, Carly E; Evans, Clare R; Richmond, Tracy K; Dunn, Erin C

2018-06-06

Recent advances in multilevel modeling allow for modeling non-hierarchical levels (e.g., youth in non-nested schools and neighborhoods) using cross-classified multilevel models (CCMM). Current practice is to cluster samples from one context (e.g., schools) and utilize the observations however they are distributed from the second context (e.g., neighborhoods). However, it is unknown whether an uneven distribution of sample size across these contexts leads to incorrect estimates of random effects in CCMMs. Using the school and neighborhood data structure in Add Health, we examined the effect of neighborhood sample size imbalance on the estimation of variance parameters in models predicting BMI. We differentially assigned students from a given school to neighborhoods within that school's catchment area using three scenarios of (im)balance. 1000 random datasets were simulated for each of five combinations of school- and neighborhood-level variance and imbalance scenarios, for a total of 15,000 simulated data sets. For each simulation, we calculated 95% CIs for the variance parameters to determine whether the true simulated variance fell within the interval. Across all simulations, the "true" school and neighborhood variance parameters were estimated 93-96% of the time. Only 5% of models failed to capture neighborhood variance; 6% failed to capture school variance. These results suggest that there is no systematic bias in the ability of CCMM to capture the true variance parameters regardless of the distribution of students across neighborhoods. Ongoing efforts to use CCMM are warranted and can proceed without concern for the sample imbalance across contexts. Copyright © 2018 Elsevier Ltd. All rights reserved.
Relating the Hadamard Variance to MCS Kalman Filter Clock Estimation

NASA Technical Reports Server (NTRS)

Hutsell, Steven T.

1996-01-01

The Global Positioning System (GPS) Master Control Station (MCS) currently makes significant use of the Allan Variance. This two-sample variance equation has proven excellent as a handy, understandable tool, both for time domain analysis of GPS cesium frequency standards, and for fine tuning the MCS's state estimation of these atomic clocks. The Allan Variance does not explicitly converge for the nose types of alpha less than or equal to minus 3 and can be greatly affected by frequency drift. Because GPS rubidium frequency standards exhibit non-trivial aging and aging noise characteristics, the basic Allan Variance analysis must be augmented in order to (a) compensate for a dynamic frequency drift, and (b) characterize two additional noise types, specifically alpha = minus 3, and alpha = minus 4. As the GPS program progresses, we will utilize a larger percentage of rubidium frequency standards than ever before. Hence, GPS rubidium clock characterization will require more attention than ever before. The three sample variance, commonly referred to as a renormalized Hadamard Variance, is unaffected by linear frequency drift, converges for alpha is greater than minus 5, and thus has utility for modeling noise in GPS rubidium frequency standards. This paper demonstrates the potential of Hadamard Variance analysis in GPS operations, and presents an equation that relates the Hadamard Variance to the MCS's Kalman filter process noises.
Extreme Response Style: Which Model Is Best?

ERIC Educational Resources Information Center

Leventhal, Brian

2017-01-01

More robust and rigorous psychometric models, such as multidimensional Item Response Theory models, have been advocated for survey applications. However, item responses may be influenced by construct-irrelevant variance factors such as preferences for extreme response options. Through empirical and simulation methods, this study evaluates the use…
Analysis of signal-dependent sensor noise on JPEG 2000-compressed Sentinel-2 multi-spectral images

NASA Astrophysics Data System (ADS)

Uss, M.; Vozel, B.; Lukin, V.; Chehdi, K.

2017-10-01

The processing chain of Sentinel-2 MultiSpectral Instrument (MSI) data involves filtering and compression stages that modify MSI sensor noise. As a result, noise in Sentinel-2 Level-1C data distributed to users becomes processed. We demonstrate that processed noise variance model is bivariate: noise variance depends on image intensity (caused by signal-dependency of photon counting detectors) and signal-to-noise ratio (SNR; caused by filtering/compression). To provide information on processed noise parameters, which is missing in Sentinel-2 metadata, we propose to use blind noise parameter estimation approach. Existing methods are restricted to univariate noise model. Therefore, we propose extension of existing vcNI+fBm blind noise parameter estimation method to multivariate noise model, mvcNI+fBm, and apply it to each band of Sentinel-2A data. Obtained results clearly demonstrate that noise variance is affected by filtering/compression for SNR less than about 15. Processed noise variance is reduced by a factor of 2 - 5 in homogeneous areas as compared to noise variance for high SNR values. Estimate of noise variance model parameters are provided for each Sentinel-2A band. Sentinel-2A MSI Level-1C noise models obtained in this paper could be useful for end users and researchers working in a variety of remote sensing applications.
Proportion of general factor variance in a hierarchical multiple-component measuring instrument: a note on a confidence interval estimation procedure.

PubMed

Raykov, Tenko; Zinbarg, Richard E

2011-05-01

A confidence interval construction procedure for the proportion of explained variance by a hierarchical, general factor in a multi-component measuring instrument is outlined. The method provides point and interval estimates for the proportion of total scale score variance that is accounted for by the general factor, which could be viewed as common to all components. The approach may also be used for testing composite (one-tailed) or simple hypotheses about this proportion, and is illustrated with a pair of examples. ©2010 The British Psychological Society.
Analysis of quantitative data obtained from toxicity studies showing non-normal distribution.

PubMed

Kobayashi, Katsumi

2005-05-01

The data obtained from toxicity studies are examined for homogeneity of variance, but, usually, they are not examined for normal distribution. In this study I examined the measured items of a carcinogenicity/chronic toxicity study with rats for both homogeneity of variance and normal distribution. It was observed that a lot of hematology and biochemistry items showed non-normal distribution. For testing normal distribution of the data obtained from toxicity studies, the data of the concurrent control group may be examined, and for the data that show a non-normal distribution, non-parametric tests with robustness may be applied.
Covariate selection with group lasso and doubly robust estimation of causal effects

PubMed Central

Koch, Brandon; Vock, David M.; Wolfson, Julian

2017-01-01

Summary The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this paper, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry. PMID:28636276
Covariate selection with group lasso and doubly robust estimation of causal effects.

PubMed

Koch, Brandon; Vock, David M; Wolfson, Julian

2018-03-01

The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this article, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry. © 2017, The International Biometric Society.
Robust optimization based upon statistical theory.

PubMed

Sobotta, B; Söhn, M; Alber, M

2010-08-01

Organ movement is still the biggest challenge in cancer treatment despite advances in online imaging. Due to the resulting geometric uncertainties, the delivered dose cannot be predicted precisely at treatment planning time. Consequently, all associated dose metrics (e.g., EUD and maxDose) are random variables with a patient-specific probability distribution. The method that the authors propose makes these distributions the basis of the optimization and evaluation process. The authors start from a model of motion derived from patient-specific imaging. On a multitude of geometry instances sampled from this model, a dose metric is evaluated. The resulting pdf of this dose metric is termed outcome distribution. The approach optimizes the shape of the outcome distribution based on its mean and variance. This is in contrast to the conventional optimization of a nominal value (e.g., PTV EUD) computed on a single geometry instance. The mean and variance allow for an estimate of the expected treatment outcome along with the residual uncertainty. Besides being applicable to the target, the proposed method also seamlessly includes the organs at risk (OARs). The likelihood that a given value of a metric is reached in the treatment is predicted quantitatively. This information reveals potential hazards that may occur during the course of the treatment, thus helping the expert to find the right balance between the risk of insufficient normal tissue sparing and the risk of insufficient tumor control. By feeding this information to the optimizer, outcome distributions can be obtained where the probability of exceeding a given OAR maximum and that of falling short of a given target goal can be minimized simultaneously. The method is applicable to any source of residual motion uncertainty in treatment delivery. Any model that quantifies organ movement and deformation in terms of probability distributions can be used as basis for the algorithm. Thus, it can generate dose distributions that are robust against interfraction and intrafraction motion alike, effectively removing the need for indiscriminate safety margins.
Adding a Parameter Increases the Variance of an Estimated Regression Function

ERIC Educational Resources Information Center

Withers, Christopher S.; Nadarajah, Saralees

2011-01-01

The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
A Measure for the Reliability of a Rating Scale Based on Longitudinal Clinical Trial Data

ERIC Educational Resources Information Center

Laenen, Annouschka; Alonso, Ariel; Molenberghs, Geert

2007-01-01

A new measure for reliability of a rating scale is introduced, based on the classical definition of reliability, as the ratio of the true score variance and the total variance. Clinical trial data can be employed to estimate the reliability of the scale in use, whenever repeated measurements are taken. The reliability is estimated from the…
Estimation of variance in Cox's regression model with shared gamma frailties.

PubMed

Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

1997-12-01

The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.
Noise parameter estimation for poisson corrupted images using variance stabilization transforms.

PubMed

Jin, Xiaodan; Xu, Zhenyu; Hirakawa, Keigo

2014-03-01

Noise is present in all images captured by real-world image sensors. Poisson distribution is said to model the stochastic nature of the photon arrival process and agrees with the distribution of measured pixel values. We propose a method for estimating unknown noise parameters from Poisson corrupted images using properties of variance stabilization. With a significantly lower computational complexity and improved stability, the proposed estimation technique yields noise parameters that are comparable in accuracy to the state-of-art methods.
Remediating Non-Positive Definite State Covariances for Collision Probability Estimation

NASA Technical Reports Server (NTRS)

Hall, Doyle T.; Hejduk, Matthew D.; Johnson, Lauren C.

2017-01-01

The NASA Conjunction Assessment Risk Analysis team estimates the probability of collision (Pc) for a set of Earth-orbiting satellites. The Pc estimation software processes satellite position+velocity states and their associated covariance matri-ces. On occasion, the software encounters non-positive definite (NPD) state co-variances, which can adversely affect or prevent the Pc estimation process. Inter-polation inaccuracies appear to account for the majority of such covariances, alt-hough other mechanisms contribute also. This paper investigates the origin of NPD state covariance matrices, three different methods for remediating these co-variances when and if necessary, and the associated effects on the Pc estimation process.
Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments

PubMed Central

Erkoc, Ali; Emiroglu, Esra

2014-01-01

In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set. PMID:25202738
Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.

PubMed

Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas

2014-01-01

In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.

An Analysis of Variance Framework for Matrix Sampling.

ERIC Educational Resources Information Center

Sirotnik, Kenneth

Significant cost savings can be achieved with the use of matrix sampling in estimating population parameters from psychometric data. The statistical design is intuitively simple, using the framework of the two-way classification analysis of variance technique. For example, the mean and variance are derived from the performance of a certain grade…
Comparison of variance estimators for meta-analysis of instrumental variable estimates

PubMed Central

Schmidt, AF; Hingorani, AD; Jefferis, BJ; White, J; Groenwold, RHH; Dudbridge, F

2016-01-01

Abstract Background: Mendelian randomization studies perform instrumental variable (IV) analysis using genetic IVs. Results of individual Mendelian randomization studies can be pooled through meta-analysis. We explored how different variance estimators influence the meta-analysed IV estimate. Methods: Two versions of the delta method (IV before or after pooling), four bootstrap estimators, a jack-knife estimator and a heteroscedasticity-consistent (HC) variance estimator were compared using simulation. Two types of meta-analyses were compared, a two-stage meta-analysis pooling results, and a one-stage meta-analysis pooling datasets. Results: Using a two-stage meta-analysis, coverage of the point estimate using bootstrapped estimators deviated from nominal levels at weak instrument settings and/or outcome probabilities ≤ 0.10. The jack-knife estimator was the least biased resampling method, the HC estimator often failed at outcome probabilities ≤ 0.50 and overall the delta method estimators were the least biased. In the presence of between-study heterogeneity, the delta method before meta-analysis performed best. Using a one-stage meta-analysis all methods performed equally well and better than two-stage meta-analysis of greater or equal size. Conclusions: In the presence of between-study heterogeneity, two-stage meta-analyses should preferentially use the delta method before meta-analysis. Weak instrument bias can be reduced by performing a one-stage meta-analysis. PMID:27591262
Parametric correlation functions to model the structure of permanent environmental (co)variances in milk yield random regression models.

PubMed

Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G

2009-09-01

The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
On the impact of relatedness on SNP association analysis.

PubMed

Gross, Arnd; Tönjes, Anke; Scholz, Markus

2017-12-06

When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate.
On robust parameter estimation in brain-computer interfacing

NASA Astrophysics Data System (ADS)

Samek, Wojciech; Nakajima, Shinichi; Kawanabe, Motoaki; Müller, Klaus-Robert

2017-12-01

Objective. The reliable estimation of parameters such as mean or covariance matrix from noisy and high-dimensional observations is a prerequisite for successful application of signal processing and machine learning algorithms in brain-computer interfacing (BCI). This challenging task becomes significantly more difficult if the data set contains outliers, e.g. due to subject movements, eye blinks or loose electrodes, as they may heavily bias the estimation and the subsequent statistical analysis. Although various robust estimators have been developed to tackle the outlier problem, they ignore important structural information in the data and thus may not be optimal. Typical structural elements in BCI data are the trials consisting of a few hundred EEG samples and indicating the start and end of a task. Approach. This work discusses the parameter estimation problem in BCI and introduces a novel hierarchical view on robustness which naturally comprises different types of outlierness occurring in structured data. Furthermore, the class of minimum divergence estimators is reviewed and a robust mean and covariance estimator for structured data is derived and evaluated with simulations and on a benchmark data set. Main results. The results show that state-of-the-art BCI algorithms benefit from robustly estimated parameters. Significance. Since parameter estimation is an integral part of various machine learning algorithms, the presented techniques are applicable to many problems beyond BCI.
Heritability of physical activity traits in Brazilian families: the Baependi Heart Study

PubMed Central

2011-01-01

Background It is commonly recognized that physical activity has familial aggregation; however, the genetic influences on physical activity phenotypes are not well characterized. This study aimed to (1) estimate the heritability of physical activity traits in Brazilian families; and (2) investigate whether genetic and environmental variance components contribute differently to the expression of these phenotypes in males and females. Methods The sample that constitutes the Baependi Heart Study is comprised of 1,693 individuals in 95 Brazilian families. The phenotypes were self-reported in a questionnaire based on the WHO-MONICA instrument. Variance component approaches, implemented in the SOLAR (Sequential Oligogenic Linkage Analysis Routines) computer package, were applied to estimate the heritability and to evaluate the heterogeneity of variance components by gender on the studied phenotypes. Results The heritability estimates were intermediate (35%) for weekly physical activity among non-sedentary subjects (weekly PA_NS), and low (9-14%) for sedentarism, weekly physical activity (weekly PA), and level of daily physical activity (daily PA). Significant evidence for heterogeneity in variance components by gender was observed for the sedentarism and weekly PA phenotypes. No significant gender differences in genetic or environmental variance components were observed for the weekly PA_NS trait. The daily PA phenotype was predominantly influenced by environmental factors, with larger effects in males than in females. Conclusions Heritability estimates for physical activity phenotypes in this sample of the Brazilian population were significant in both males and females, and varied from low to intermediate magnitude. Significant evidence for heterogeneity in variance components by gender was observed. These data add to the knowledge of the physical activity traits in the Brazilian study population, and are concordant with the notion of significant biological determination in active behavior. PMID:22126647
Heritability of physical activity traits in Brazilian families: the Baependi Heart Study.

PubMed

Horimoto, Andréa R V R; Giolo, Suely R; Oliveira, Camila M; Alvim, Rafael O; Soler, Júlia P; de Andrade, Mariza; Krieger, José E; Pereira, Alexandre C

2011-11-29

It is commonly recognized that physical activity has familial aggregation; however, the genetic influences on physical activity phenotypes are not well characterized. This study aimed to (1) estimate the heritability of physical activity traits in Brazilian families; and (2) investigate whether genetic and environmental variance components contribute differently to the expression of these phenotypes in males and females. The sample that constitutes the Baependi Heart Study is comprised of 1,693 individuals in 95 Brazilian families. The phenotypes were self-reported in a questionnaire based on the WHO-MONICA instrument. Variance component approaches, implemented in the SOLAR (Sequential Oligogenic Linkage Analysis Routines) computer package, were applied to estimate the heritability and to evaluate the heterogeneity of variance components by gender on the studied phenotypes. The heritability estimates were intermediate (35%) for weekly physical activity among non-sedentary subjects (weekly PA_NS), and low (9-14%) for sedentarism, weekly physical activity (weekly PA), and level of daily physical activity (daily PA). Significant evidence for heterogeneity in variance components by gender was observed for the sedentarism and weekly PA phenotypes. No significant gender differences in genetic or environmental variance components were observed for the weekly PA_NS trait. The daily PA phenotype was predominantly influenced by environmental factors, with larger effects in males than in females. Heritability estimates for physical activity phenotypes in this sample of the Brazilian population were significant in both males and females, and varied from low to intermediate magnitude. Significant evidence for heterogeneity in variance components by gender was observed. These data add to the knowledge of the physical activity traits in the Brazilian study population, and are concordant with the notion of significant biological determination in active behavior.
GPZ: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts

NASA Astrophysics Data System (ADS)

Almosallam, Ibrahim A.; Jarvis, Matt J.; Roberts, Stephen J.

2016-10-01

The next generation of cosmology experiments will be required to use photometric redshifts rather than spectroscopic redshifts. Obtaining accurate and well-characterized photometric redshift distributions is therefore critical for Euclid, the Large Synoptic Survey Telescope and the Square Kilometre Array. However, determining accurate variance predictions alongside single point estimates is crucial, as they can be used to optimize the sample of galaxies for the specific experiment (e.g. weak lensing, baryon acoustic oscillations, supernovae), trading off between completeness and reliability in the galaxy sample. The various sources of uncertainty in measurements of the photometry and redshifts put a lower bound on the accuracy that any model can hope to achieve. The intrinsic uncertainty associated with estimates is often non-uniform and input-dependent, commonly known in statistics as heteroscedastic noise. However, existing approaches are susceptible to outliers and do not take into account variance induced by non-uniform data density and in most cases require manual tuning of many parameters. In this paper, we present a Bayesian machine learning approach that jointly optimizes the model with respect to both the predictive mean and variance we refer to as Gaussian processes for photometric redshifts (GPZ). The predictive variance of the model takes into account both the variance due to data density and photometric noise. Using the Sloan Digital Sky Survey (SDSS) DR12 data, we show that our approach substantially outperforms other machine learning methods for photo-z estimation and their associated variance, such as TPZ and ANNZ2. We provide a MATLAB and PYTHON implementations that are available to download at https://github.com/OxfordML/GPz.
Uncertainty Estimation using Bootstrapped Kriging Predictions for Precipitation Isoscapes

NASA Astrophysics Data System (ADS)

Ma, C.; Bowen, G. J.; Vander Zanden, H.; Wunder, M.

2017-12-01

Isoscapes are spatial models representing the distribution of stable isotope values across landscapes. Isoscapes of hydrogen and oxygen in precipitation are now widely used in a diversity of fields, including geology, biology, hydrology, and atmospheric science. To generate isoscapes, geostatistical methods are typically applied to extend predictions from limited data measurements. Kriging is a popular method in isoscape modeling, but quantifying the uncertainty associated with the resulting isoscapes is challenging. Applications that use precipitation isoscapes to determine sample origin require estimation of uncertainty. Here we present a simple bootstrap method (SBM) to estimate the mean and uncertainty of the krigged isoscape and compare these results with a generalized bootstrap method (GBM) applied in previous studies. We used hydrogen isotopic data from IsoMAP to explore these two approaches for estimating uncertainty. We conducted 10 simulations for each bootstrap method and found that SBM results in more kriging predictions (9/10) compared to GBM (4/10). Prediction from SBM was closer to the original prediction generated without bootstrapping and had less variance than GBM. SBM was tested on different datasets from IsoMAP with different numbers of observation sites. We determined that predictions from the datasets with fewer than 40 observation sites using SBM were more variable than the original prediction. The approaches we used for estimating uncertainty will be compiled in an R package that is under development. We expect that these robust estimates of precipitation isoscape uncertainty can be applied in diagnosing the origin of samples ranging from various type of waters to migratory animals, food products, and humans.
Multi-site study of additive genetic effects on fractional anisotropy of cerebral white matter: Comparing meta and megaanalytical approaches for data pooling.

PubMed

Kochunov, Peter; Jahanshad, Neda; Sprooten, Emma; Nichols, Thomas E; Mandl, René C; Almasy, Laura; Booth, Tom; Brouwer, Rachel M; Curran, Joanne E; de Zubicaray, Greig I; Dimitrova, Rali; Duggirala, Ravi; Fox, Peter T; Hong, L Elliot; Landman, Bennett A; Lemaitre, Hervé; Lopez, Lorna M; Martin, Nicholas G; McMahon, Katie L; Mitchell, Braxton D; Olvera, Rene L; Peterson, Charles P; Starr, John M; Sussmann, Jessika E; Toga, Arthur W; Wardlaw, Joanna M; Wright, Margaret J; Wright, Susan N; Bastin, Mark E; McIntosh, Andrew M; Boomsma, Dorret I; Kahn, René S; den Braber, Anouk; de Geus, Eco J C; Deary, Ian J; Hulshoff Pol, Hilleke E; Williamson, Douglas E; Blangero, John; van 't Ent, Dennis; Thompson, Paul M; Glahn, David C

2014-07-15

Combining datasets across independent studies can boost statistical power by increasing the numbers of observations and can achieve more accurate estimates of effect sizes. This is especially important for genetic studies where a large number of observations are required to obtain sufficient power to detect and replicate genetic effects. There is a need to develop and evaluate methods for joint-analytical analyses of rich datasets collected in imaging genetics studies. The ENIGMA-DTI consortium is developing and evaluating approaches for obtaining pooled estimates of heritability through meta-and mega-genetic analytical approaches, to estimate the general additive genetic contributions to the intersubject variance in fractional anisotropy (FA) measured from diffusion tensor imaging (DTI). We used the ENIGMA-DTI data harmonization protocol for uniform processing of DTI data from multiple sites. We evaluated this protocol in five family-based cohorts providing data from a total of 2248 children and adults (ages: 9-85) collected with various imaging protocols. We used the imaging genetics analysis tool, SOLAR-Eclipse, to combine twin and family data from Dutch, Australian and Mexican-American cohorts into one large "mega-family". We showed that heritability estimates may vary from one cohort to another. We used two meta-analytical (the sample-size and standard-error weighted) approaches and a mega-genetic analysis to calculate heritability estimates across-population. We performed leave-one-out analysis of the joint estimates of heritability, removing a different cohort each time to understand the estimate variability. Overall, meta- and mega-genetic analyses of heritability produced robust estimates of heritability. Copyright © 2014 Elsevier Inc. All rights reserved.
Robustness of statistical tests for multiplicative terms in the additive main effects and multiplicative interaction model for cultivar trials.

PubMed

Piepho, H P

1995-03-01

The additive main effects multiplicative interaction model is frequently used in the analysis of multilocation trials. In the analysis of such data it is of interest to decide how many of the multiplicative interaction terms are significant. Several tests for this task are available, all of which assume that errors are normally distributed with a common variance. This paper investigates the robustness of several tests (Gollob, F GH1, FGH2, FR)to departures from these assumptions. It is concluded that, because of its better robustness, the F Rtest is preferable. If the other tests are to be used, preliminary tests for the validity of assumptions should be performed.
Parameter estimation in 3D affine and similarity transformation: implementation of variance component estimation

NASA Astrophysics Data System (ADS)

Amiri-Simkooei, A. R.

2018-01-01

Three-dimensional (3D) coordinate transformations, generally consisting of origin shifts, axes rotations, scale changes, and skew parameters, are widely used in many geomatics applications. Although in some geodetic applications simplified transformation models are used based on the assumption of small transformation parameters, in other fields of applications such parameters are indeed large. The algorithms of two recent papers on the weighted total least-squares (WTLS) problem are used for the 3D coordinate transformation. The methodology can be applied to the case when the transformation parameters are generally large of which no approximate values of the parameters are required. Direct linearization of the rotation and scale parameters is thus not required. The WTLS formulation is employed to take into consideration errors in both the start and target systems on the estimation of the transformation parameters. Two of the well-known 3D transformation methods, namely affine (12, 9, and 8 parameters) and similarity (7 and 6 parameters) transformations, can be handled using the WTLS theory subject to hard constraints. Because the method can be formulated by the standard least-squares theory with constraints, the covariance matrix of the transformation parameters can directly be provided. The above characteristics of the 3D coordinate transformation are implemented in the presence of different variance components, which are estimated using the least squares variance component estimation. In particular, the estimability of the variance components is investigated. The efficacy of the proposed formulation is verified on two real data sets.
Variance Estimation for NAEP Data Using a Resampling-Based Approach: An Application of Cognitive Diagnostic Models. Research Report. ETS RR-10-26

ERIC Educational Resources Information Center

Hsieh, Chueh-an; Xu, Xueli; von Davier, Matthias

2010-01-01

This paper presents an application of a jackknifing approach to variance estimation of ability inferences for groups of students, using a multidimensional discrete model for item response data. The data utilized to demonstrate the approach come from the National Assessment of Educational Progress (NAEP). In contrast to the operational approach…
Weighting by Inverse Variance or by Sample Size in Random-Effects Meta-Analysis

ERIC Educational Resources Information Center

Marin-Martinez, Fulgencio; Sanchez-Meca, Julio

2010-01-01

Most of the statistical procedures in meta-analysis are based on the estimation of average effect sizes from a set of primary studies. The optimal weight for averaging a set of independent effect sizes is the inverse variance of each effect size, but in practice these weights have to be estimated, being affected by sampling error. When assuming a…
Selection enhanced estimates of µ-calpain, calpastatin, and dacylglycerol O-acyltransferase 1 genetic effects on pre-weaning performance, carcass quality traits, and residual variance of tenderness in composite ... cattle

USDA-ARS?s Scientific Manuscript database

Selection of the composite MARC III population for markers allowed better estimates of effects and inheritance of markers for targeted carcass quality traits (n=254) and nontargeted traits and an evaluation of SNP specific residual variance models for tenderness. Genotypic effects of CAPN1 haplotyp...
Adaptive torque estimation of robot joint with harmonic drive transmission

NASA Astrophysics Data System (ADS)

Shi, Zhiguo; Li, Yuankai; Liu, Guangjun

2017-11-01

Robot joint torque estimation using input and output position measurements is a promising technique, but the result may be affected by the load variation of the joint. In this paper, a torque estimation method with adaptive robustness and optimality adjustment according to load variation is proposed for robot joint with harmonic drive transmission. Based on a harmonic drive model and a redundant adaptive robust Kalman filter (RARKF), the proposed approach can adapt torque estimation filtering optimality and robustness to the load variation by self-tuning the filtering gain and self-switching the filtering mode between optimal and robust. The redundant factor of RARKF is designed as a function of the motor current for tolerating the modeling error and load-dependent filtering mode switching. The proposed joint torque estimation method has been experimentally studied in comparison with a commercial torque sensor and two representative filtering methods. The results have demonstrated the effectiveness of the proposed torque estimation technique.
Experimental design for dynamics identification of cellular processes.

PubMed

Dinh, Vu; Rundell, Ann E; Buzzard, Gregery T

2014-03-01

We address the problem of using nonlinear models to design experiments to characterize the dynamics of cellular processes by using the approach of the Maximally Informative Next Experiment (MINE), which was introduced in W. Dong et al. (PLoS ONE 3(8):e3105, 2008) and independently in M.M. Donahue et al. (IET Syst. Biol. 4:249-262, 2010). In this approach, existing data is used to define a probability distribution on the parameters; the next measurement point is the one that yields the largest model output variance with this distribution. Building upon this approach, we introduce the Expected Dynamics Estimator (EDE), which is the expected value using this distribution of the output as a function of time. We prove the consistency of this estimator (uniform convergence to true dynamics) even when the chosen experiments cluster in a finite set of points. We extend this proof of consistency to various practical assumptions on noisy data and moderate levels of model mismatch. Through the derivation and proof, we develop a relaxed version of MINE that is more computationally tractable and robust than the original formulation. The results are illustrated with numerical examples on two nonlinear ordinary differential equation models of biomolecular and cellular processes.
A negentropy minimization approach to adaptive equalization for digital communication systems.

PubMed

Choi, Sooyong; Lee, Te-Won

2004-07-01

In this paper, we introduce and investigate a new adaptive equalization method based on minimizing approximate negentropy of the estimation error for a finite-length equalizer. We consider an approximate negentropy using nonpolynomial expansions of the estimation error as a new performance criterion to improve performance of a linear equalizer based on minimizing minimum mean squared error (MMSE). Negentropy includes higher order statistical information and its minimization provides improved converge, performance and accuracy compared to traditional methods such as MMSE in terms of bit error rate (BER). The proposed negentropy minimization (NEGMIN) equalizer has two kinds of solutions, the MMSE solution and the other one, depending on the ratio of the normalization parameters. The NEGMIN equalizer has best BER performance when the ratio of the normalization parameters is properly adjusted to maximize the output power(variance) of the NEGMIN equalizer. Simulation experiments show that BER performance of the NEGMIN equalizer with the other solution than the MMSE one has similar characteristics to the adaptive minimum bit error rate (AMBER) equalizer. The main advantage of the proposed equalizer is that it needs significantly fewer training symbols than the AMBER equalizer. Furthermore, the proposed equalizer is more robust to nonlinear distortions than the MMSE equalizer.
Turbulence characterization by studying laser beam wandering in a differential tracking motion setup

NASA Astrophysics Data System (ADS)

Pérez, Darío G.; Zunino, Luciano; Gulich, Damián; Funes, Gustavo; Garavaglia, Mario

2009-09-01

The Differential Image Motion Monitor (DIMM) is a standard and widely used instrument for astronomical seeing measurements. The seeing values are estimated from the variance of the differential image motion over two equal small pupils some distance apart. The twin pupils are usually cut in a mask on the entrance pupil of the telescope. As a differential method, it has the advantage of being immune to tracking errors, eliminating erratic motion of the telescope. The Differential Laser Tracking Motion (DLTM) is introduced here inspired by the same idea. Two identical laser beams are propagated through a path of air in turbulent motion, at the end of it their wander is registered by two position sensitive detectors-at a count of 800 samples per second. Time series generated from the difference of the pair of centroid laser beam coordinates is then analyzed using the multifractal detrended fluctuation analysis. Measurements were performed at the laboratory with synthetic turbulence: changing the relative separation of the beams for different turbulent regimes. The dependence, with respect to these parameters, and the robustness of our estimators is compared with the non-differential method. This method is an improvement with respect to previous approaches that study the beam wandering.
Motion adaptive Kalman filter for super-resolution

NASA Astrophysics Data System (ADS)

Richter, Martin; Nasse, Fabian; Schröder, Hartmut

2011-01-01

Superresolution is a sophisticated strategy to enhance image quality of both low and high resolution video, performing tasks like artifact reduction, scaling and sharpness enhancement in one algorithm, all of them reconstructing high frequency components (above Nyquist frequency) in some way. Especially recursive superresolution algorithms can fulfill high quality aspects because they control the video output using a feed-back loop and adapt the result in the next iteration. In addition to excellent output quality, temporal recursive methods are very hardware efficient and therefore even attractive for real-time video processing. A very promising approach is the utilization of Kalman filters as proposed by Farsiu et al. Reliable motion estimation is crucial for the performance of superresolution. Therefore, robust global motion models are mainly used, but this also limits the application of superresolution algorithm. Thus, handling sequences with complex object motion is essential for a wider field of application. Hence, this paper proposes improvements by extending the Kalman filter approach using motion adaptive variance estimation and segmentation techniques. Experiments confirm the potential of our proposal for ideal and real video sequences with complex motion and further compare its performance to state-of-the-art methods like trainable filters.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.