On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Estimating integrated variance in the presence of microstructure noise using linear regression
NASA Astrophysics Data System (ADS)
Holý, Vladimír
2017-07-01
Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.
Smooth empirical Bayes estimation of observation error variances in linear systems
NASA Technical Reports Server (NTRS)
Martz, H. F., Jr.; Lian, M. W.
1972-01-01
A smooth empirical Bayes estimator was developed for estimating the unknown random scale component of each of a set of observation error variances. It is shown that the estimator possesses a smaller average squared error loss than other estimators for a discrete time linear system.
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Mulder, Han A; Rönnegård, Lars; Fikse, W Freddy; Veerkamp, Roel F; Strandberg, Erling
2013-07-04
Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike's information criterion using h-likelihood to select the best fitting model. We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike's information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike's information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Development of a technique for estimating noise covariances using multiple observers
NASA Technical Reports Server (NTRS)
Bundick, W. Thomas
1988-01-01
Friedland's technique for estimating the unknown noise variances of a linear system using multiple observers has been extended by developing a general solution for the estimates of the variances, developing the statistics (mean and standard deviation) of these estimates, and demonstrating the solution on two examples.
Li, Xiaobo; Hu, Haofeng; Liu, Tiegen; Huang, Bingjing; Song, Zhanjie
2016-04-04
We consider the degree of linear polarization (DOLP) polarimetry system, which performs two intensity measurements at orthogonal polarization states to estimate DOLP. We show that if the total integration time of intensity measurements is fixed, the variance of the DOLP estimator depends on the distribution of integration time for two intensity measurements. Therefore, by optimizing the distribution of integration time, the variance of the DOLP estimator can be decreased. In this paper, we obtain the closed-form solution of the optimal distribution of integration time in an approximate way by employing Delta method and Lagrange multiplier method. According to the theoretical analyses and real-world experiments, it is shown that the variance of the DOLP estimator can be decreased for any value of DOLP. The method proposed in this paper can effectively decrease the measurement variance and thus statistically improve the measurement accuracy of the polarimetry system.
Trends in Elevated Triglyceride in Adults: United States, 2001-2012
... All variance estimates accounted for the complex survey design using Taylor series linearization ( 10 ). Percentage estimates for the total adult ... al. National Health and Nutrition Examination Survey: Sample design, 2007–2010. ... KM. Taylor series methods. In: Introduction to variance estimation. 2nd ed. ...
2013-01-01
Background Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring. PMID:23827014
Rönnegård, L; Felleki, M; Fikse, W F; Mulder, H A; Strandberg, E
2013-04-01
Trait uniformity, or micro-environmental sensitivity, may be studied through individual differences in residual variance. These differences appear to be heritable, and the need exists, therefore, to fit models to predict breeding values explaining differences in residual variance. The aim of this paper is to estimate breeding values for micro-environmental sensitivity (vEBV) in milk yield and somatic cell score, and their associated variance components, on a large dairy cattle data set having more than 1.6 million records. Estimation of variance components, ordinary breeding values, and vEBV was performed using standard variance component estimation software (ASReml), applying the methodology for double hierarchical generalized linear models. Estimation using ASReml took less than 7 d on a Linux server. The genetic standard deviations for residual variance were 0.21 and 0.22 for somatic cell score and milk yield, respectively, which indicate moderate genetic variance for residual variance and imply that a standard deviation change in vEBV for one of these traits would alter the residual variance by 20%. This study shows that estimation of variance components, estimated breeding values and vEBV, is feasible for large dairy cattle data sets using standard variance component estimation software. The possibility to select for uniformity in Holstein dairy cattle based on these estimates is discussed. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
On the impact of relatedness on SNP association analysis.
Gross, Arnd; Tönjes, Anke; Scholz, Markus
2017-12-06
When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate.
Deletion Diagnostics for the Generalised Linear Mixed Model with independent random effects
Ganguli, B.; Roy, S. Sen; Naskar, M.; Malloy, E. J.; Eisen, E. A.
2015-01-01
The Generalised Linear Mixed Model (GLMM) is widely used for modelling environmental data. However, such data are prone to influential observations which can distort the estimated exposure-response curve particularly in regions of high exposure. Deletion diagnostics for iterative estimation schemes commonly derive the deleted estimates based on a single iteration of the full system holding certain pivotal quantities such as the information matrix to be constant. In this paper, we present an approximate formula for the deleted estimates and Cook’s distance for the GLMM which does not assume that the estimates of variance parameters are unaffected by deletion. The procedure allows the user to calculate standardised DFBETAs for mean as well as variance parameters. In certain cases, such as when using the GLMM as a device for smoothing, such residuals for the variance parameters are interesting in their own right. In general, the procedure leads to deleted estimates of mean parameters which are corrected for the effect of deletion on variance components as estimation of the two sets of parameters is interdependent. The probabilistic behaviour of these residuals is investigated and a simulation based procedure suggested for their standardisation. The method is used to identify influential individuals in an occupational cohort exposed to silica. The results show that failure to conduct post model fitting diagnostics for variance components can lead to erroneous conclusions about the fitted curve and unstable confidence intervals. PMID:26626135
Estimation and Simulation of Slow Crack Growth Parameters from Constant Stress Rate Data
NASA Technical Reports Server (NTRS)
Salem, Jonathan A.; Weaver, Aaron S.
2003-01-01
Closed form, approximate functions for estimating the variances and degrees-of-freedom associated with the slow crack growth parameters n, D, B, and A(sup *) as measured using constant stress rate ('dynamic fatigue') testing were derived by using propagation of errors. Estimates made with the resulting functions and slow crack growth data for a sapphire window were compared to the results of Monte Carlo simulations. The functions for estimation of the variances of the parameters were derived both with and without logarithmic transformation of the initial slow crack growth equations. The transformation was performed to make the functions both more linear and more normal. Comparison of the Monte Carlo results and the closed form expressions derived with propagation of errors indicated that linearization is not required for good estimates of the variances of parameters n and D by the propagation of errors method. However, good estimates variances of the parameters B and A(sup *) could only be made when the starting slow crack growth equation was transformed and the coefficients of variation of the input parameters were not too large. This was partially a result of the skewered distributions of B and A(sup *). Parametric variation of the input parameters was used to determine an acceptable range for using closed form approximate equations derived from propagation of errors.
The Efficiency of Split Panel Designs in an Analysis of Variance Model
Wang, Wei-Guo; Liu, Hai-Jun
2016-01-01
We consider split panel design efficiency in analysis of variance models, that is, the determination of the cross-sections series optimal proportion in all samples, to minimize parametric best linear unbiased estimators of linear combination variances. An orthogonal matrix is constructed to obtain manageable expression of variances. On this basis, we derive a theorem for analyzing split panel design efficiency irrespective of interest and budget parameters. Additionally, relative estimator efficiency based on the split panel to an estimator based on a pure panel or a pure cross-section is present. The analysis shows that the gains from split panel can be quite substantial. We further consider the efficiency of split panel design, given a budget, and transform it to a constrained nonlinear integer programming. Specifically, an efficient algorithm is designed to solve the constrained nonlinear integer programming. Moreover, we combine one at time designs and factorial designs to illustrate the algorithm’s efficiency with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, and the efficient ranges of the algorithm parameters are given to ensure a good solution. PMID:27163447
Pare, Guillaume; Mao, Shihong; Deng, Wei Q
2016-06-08
Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.
Pare, Guillaume; Mao, Shihong; Deng, Wei Q.
2016-01-01
Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519
Estimating the variance for heterogeneity in arm-based network meta-analysis.
Piepho, Hans-Peter; Madden, Laurence V; Roger, James; Payne, Roger; Williams, Emlyn R
2018-04-19
Network meta-analysis can be implemented by using arm-based or contrast-based models. Here we focus on arm-based models and fit them using generalized linear mixed model procedures. Full maximum likelihood (ML) estimation leads to biased trial-by-treatment interaction variance estimates for heterogeneity. Thus, our objective is to investigate alternative approaches to variance estimation that reduce bias compared with full ML. Specifically, we use penalized quasi-likelihood/pseudo-likelihood and hierarchical (h) likelihood approaches. In addition, we consider a novel model modification that yields estimators akin to the residual maximum likelihood estimator for linear mixed models. The proposed methods are compared by simulation, and 2 real datasets are used for illustration. Simulations show that penalized quasi-likelihood/pseudo-likelihood and h-likelihood reduce bias and yield satisfactory coverage rates. Sum-to-zero restriction and baseline contrasts for random trial-by-treatment interaction effects, as well as a residual ML-like adjustment, also reduce bias compared with an unconstrained model when ML is used, but coverage rates are not quite as good. Penalized quasi-likelihood/pseudo-likelihood and h-likelihood are therefore recommended. Copyright © 2018 John Wiley & Sons, Ltd.
Spence, Jeffrey S; Brier, Matthew R; Hart, John; Ferree, Thomas C
2013-03-01
Linear statistical models are used very effectively to assess task-related differences in EEG power spectral analyses. Mixed models, in particular, accommodate more than one variance component in a multisubject study, where many trials of each condition of interest are measured on each subject. Generally, intra- and intersubject variances are both important to determine correct standard errors for inference on functions of model parameters, but it is often assumed that intersubject variance is the most important consideration in a group study. In this article, we show that, under common assumptions, estimates of some functions of model parameters, including estimates of task-related differences, are properly tested relative to the intrasubject variance component only. A substantial gain in statistical power can arise from the proper separation of variance components when there is more than one source of variability. We first develop this result analytically, then show how it benefits a multiway factoring of spectral, spatial, and temporal components from EEG data acquired in a group of healthy subjects performing a well-studied response inhibition task. Copyright © 2011 Wiley Periodicals, Inc.
The variance of the locally measured Hubble parameter explained with different estimators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Odderskov, Io; Hannestad, Steen; Brandbyge, Jacob, E-mail: isho07@phys.au.dk, E-mail: sth@phys.au.dk, E-mail: jacobb@phys.au.dk
We study the expected variance of measurements of the Hubble constant, H {sub 0}, as calculated in either linear perturbation theory or using non-linear velocity power spectra derived from N -body simulations. We compare the variance with that obtained by carrying out mock observations in the N-body simulations, and show that the estimator typically used for the local Hubble constant in studies based on perturbation theory is different from the one used in studies based on N-body simulations. The latter gives larger weight to distant sources, which explains why studies based on N-body simulations tend to obtain a smaller variancemore » than that found from studies based on the power spectrum. Although both approaches result in a variance too small to explain the discrepancy between the value of H {sub 0} from CMB measurements and the value measured in the local universe, these considerations are important in light of the percent determination of the Hubble constant in the local universe.« less
On the error in crop acreage estimation using satellite (LANDSAT) data
NASA Technical Reports Server (NTRS)
Chhikara, R. (Principal Investigator)
1983-01-01
The problem of crop acreage estimation using satellite data is discussed. Bias and variance of a crop proportion estimate in an area segment obtained from the classification of its multispectral sensor data are derived as functions of the means, variances, and covariance of error rates. The linear discriminant analysis and the class proportion estimation for the two class case are extended to include a third class of measurement units, where these units are mixed on ground. Special attention is given to the investigation of mislabeling in training samples and its effect on crop proportion estimation. It is shown that the bias and variance of the estimate of a specific crop acreage proportion increase as the disparity in mislabeling rates between two classes increases. Some interaction is shown to take place, causing the bias and the variance to decrease at first and then to increase, as the mixed unit class varies in size from 0 to 50 percent of the total area segment.
Jackknife Variance Estimator for Two Sample Linear Rank Statistics
1988-11-01
Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT
Estimation variance bounds of importance sampling simulations in digital communication systems
NASA Technical Reports Server (NTRS)
Lu, D.; Yao, K.
1991-01-01
In practical applications of importance sampling (IS) simulation, two basic problems are encountered, that of determining the estimation variance and that of evaluating the proper IS parameters needed in the simulations. The authors derive new upper and lower bounds on the estimation variance which are applicable to IS techniques. The upper bound is simple to evaluate and may be minimized by the proper selection of the IS parameter. Thus, lower and upper bounds on the improvement ratio of various IS techniques relative to the direct Monte Carlo simulation are also available. These bounds are shown to be useful and computationally simple to obtain. Based on the proposed technique, one can readily find practical suboptimum IS parameters. Numerical results indicate that these bounding techniques are useful for IS simulations of linear and nonlinear communication systems with intersymbol interference in which bit error rate and IS estimation variances cannot be obtained readily using prior techniques.
USDA-ARS?s Scientific Manuscript database
Transformations to multiple trait mixed model equations (MME) which are intended to improve computational efficiency in best linear unbiased prediction (BLUP) and restricted maximum likelihood (REML) are described. It is shown that traits that are expected or estimated to have zero residual variance...
Young Adults Seeking Medical Care: Do Race and Ethnicity Matter?
... services were sought or received. Data source and methods Data from the 2008 and 2009 NHIS were ... design of the NHIS. The Taylor series linearization method was chosen for variance estimation. All estimates shown ...
Trends in Allergic Conditions among Children: United States, 1997-2011
... and imputed family income ( 13 ). Data source and methods Prevalence estimates for allergic conditions were obtained from ... sample design of NHIS. The Taylor series linearization method was chosen for variance estimation. Differences between percentages ...
Strategies Used by Adults to Reduce Their Prescription Drug Costs
... on their 2010 income ( 5 ). Data source and methods Data from the 2011 NHIS were used for ... sample design of NHIS. The Taylor series linearization method was chosen for variance estimation. All estimates shown ...
Estimation of group means when adjusting for covariates in generalized linear models.
Qu, Yongming; Luo, Junxiang
2015-01-01
Generalized linear models are commonly used to analyze categorical data such as binary, count, and ordinal outcomes. Adjusting for important prognostic factors or baseline covariates in generalized linear models may improve the estimation efficiency. The model-based mean for a treatment group produced by most software packages estimates the response at the mean covariate, not the mean response for this treatment group for the studied population. Although this is not an issue for linear models, the model-based group mean estimates in generalized linear models could be seriously biased for the true group means. We propose a new method to estimate the group mean consistently with the corresponding variance estimation. Simulation showed the proposed method produces an unbiased estimator for the group means and provided the correct coverage probability. The proposed method was applied to analyze hypoglycemia data from clinical trials in diabetes. Copyright © 2014 John Wiley & Sons, Ltd.
[Analysis of variance of repeated data measured by water maze with SPSS].
Qiu, Hong; Jin, Guo-qin; Jin, Ru-feng; Zhao, Wei-kang
2007-01-01
To introduce the method of analyzing repeated data measured by water maze with SPSS 11.0, and offer a reference statistical method to clinical and basic medicine researchers who take the design of repeated measures. Using repeated measures and multivariate analysis of variance (ANOVA) process of the general linear model in SPSS and giving comparison among different groups and different measure time pairwise. Firstly, Mauchly's test of sphericity should be used to judge whether there were relations among the repeatedly measured data. If any (P
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen
2015-05-01
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Estimating acreage by double sampling using LANDSAT data
NASA Technical Reports Server (NTRS)
Pont, F.; Horwitz, H.; Kauth, R. (Principal Investigator)
1982-01-01
Double sampling techniques employing LANDSAT data for estimating the acreage of corn and soybeans was investigated and evaluated. The evaluation was based on estimated costs and correlations between two existing procedures having differing cost/variance characteristics, and included consideration of their individual merits when coupled with a fictional 'perfect' procedure of zero bias and variance. Two features of the analysis are: (1) the simultaneous estimation of two or more crops; and (2) the imposition of linear cost constraints among two or more types of resource. A reasonably realistic operational scenario was postulated. The costs were estimated from current experience with the measurement procedures involved, and the correlations were estimated from a set of 39 LACIE-type sample segments located in the U.S. Corn Belt. For a fixed variance of the estimate, double sampling with the two existing LANDSAT measurement procedures can result in a 25% or 50% cost reduction. Double sampling which included the fictional perfect procedure results in a more cost effective combination when it is used with the lower cost/higher variance representative of the existing procedures.
Gap-filling methods to impute eddy covariance flux data by preserving variance.
NASA Astrophysics Data System (ADS)
Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.
2015-12-01
To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables of the parameters traditionally used. Then, comparisons of the predicted values from these methods and 'traditional' gap-filling methods (using 12 fixed monthly windows) will be assessed to show the scale of preserving variance. Further, this method will be applied to impute artificially created gaps for analyzing if variance is preserved.
Estimating linear effects in ANOVA designs: the easy way.
Pinhas, Michal; Tzelgov, Joseph; Ganor-Stern, Dana
2012-09-01
Research in cognitive science has documented numerous phenomena that are approximated by linear relationships. In the domain of numerical cognition, the use of linear regression for estimating linear effects (e.g., distance and SNARC effects) became common following Fias, Brysbaert, Geypens, and d'Ydewalle's (1996) study on the SNARC effect. While their work has become the model for analyzing linear effects in the field, it requires statistical analysis of individual participants and does not provide measures of the proportions of variability accounted for (cf. Lorch & Myers, 1990). In the present methodological note, using both the distance and SNARC effects as examples, we demonstrate how linear effects can be estimated in a simple way within the framework of repeated measures analysis of variance. This method allows for estimating effect sizes in terms of both slope and proportions of variability accounted for. Finally, we show that our method can easily be extended to estimate linear interaction effects, not just linear effects calculated as main effects.
... Park, NC) to account for the complex sample design of NHIS, taking into account stratum and primary sampling unit (PSU) identifiers. The Taylor series linearization method was chosen for variance estimation. Trends ...
Analysis and application of minimum variance discrete time system identification
NASA Technical Reports Server (NTRS)
Kaufman, H.; Kotob, S.
1975-01-01
An on-line minimum variance parameter identifier is developed which embodies both accuracy and computational efficiency. The formulation results in a linear estimation problem with both additive and multiplicative noise. The resulting filter which utilizes both the covariance of the parameter vector itself and the covariance of the error in identification is proven to be mean square convergent and mean square consistent. The MV parameter identification scheme is then used to construct a stable state and parameter estimation algorithm.
... and imputed family income ( 10 ). Data source and methods All ADHD prevalence estimates were obtained from the ... sample design of NHIS. The Taylor series linearization method was chosen for variance estimation. Differences between percentages ...
Streamflow record extension using power transformations and application to sediment transport
NASA Astrophysics Data System (ADS)
Moog, Douglas B.; Whiting, Peter J.; Thomas, Robert B.
1999-01-01
To obtain a representative set of flow rates for a stream, it is often desirable to fill in missing data or extend measurements to a longer time period by correlation to a nearby gage with a longer record. Linear least squares regression of the logarithms of the flows is a traditional and still common technique. However, its purpose is to generate optimal estimates of each day's discharge, rather than the population of discharges, for which it tends to underestimate variance. Maintenance-of-variance-extension (MOVE) equations [Hirsch, 1982] were developed to correct this bias. This study replaces the logarithmic transformation by the more general Box-Cox scaled power transformation, generating a more linear, constant-variance relationship for the MOVE extension. Combining the Box-Cox transformation with the MOVE extension is shown to improve accuracy in estimating order statistics of flow rate, particularly for the nonextreme discharges which generally govern cumulative transport over time. This advantage is illustrated by prediction of cumulative fractions of total bed load transport.
Scale of association: hierarchical linear models and the measurement of ecological systems
Sean M. McMahon; Jeffrey M. Diez
2007-01-01
A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance-covariance parameters in hierarchically structured...
A Statistical Approach to Passive Target Tracking.
1981-04-01
a fixed heading of 90 degrees. For 7F. A. Graybill , An Introduction to Linear Statistical Models , Vol. 1, New York: John Wiley&-Sons -Inc. (1961). 13...likelihood estimators. 12 NCSC TM 311-81 The adjustment for a changing error variance is easy using the linear model approach; i.e., use weighted
Vitezica, Zulma G; Varona, Luis; Legarra, Andres
2013-12-01
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.
... to exercise or physical activity. Data source and methods National Health Interview Survey (NHIS) data are collected ... sample design of NHIS. The Taylor series linearization method was used for variance estimation. All estimates shown ...
... were considered to be uninsured. Data source and methods NHIS data were used to estimate the percentage ... sample design of NHIS. The Taylor series linearization method was chosen for variance estimation. Differences between percentages ...
... of center); or g) other.” Data sources and methods This analysis used data from the 2010–2012 ... sample design of NHIS. The Taylor series linearization method was chosen for variance estimation. All estimates shown ...
Gonçalves, M A D; Bello, N M; Dritz, S S; Tokach, M D; DeRouchey, J M; Woodworth, J C; Goodband, R D
2016-05-01
Advanced methods for dose-response assessments are used to estimate the minimum concentrations of a nutrient that maximizes a given outcome of interest, thereby determining nutritional requirements for optimal performance. Contrary to standard modeling assumptions, experimental data often present a design structure that includes correlations between observations (i.e., blocking, nesting, etc.) as well as heterogeneity of error variances; either can mislead inference if disregarded. Our objective is to demonstrate practical implementation of linear and nonlinear mixed models for dose-response relationships accounting for correlated data structure and heterogeneous error variances. To illustrate, we modeled data from a randomized complete block design study to evaluate the standardized ileal digestible (SID) Trp:Lys ratio dose-response on G:F of nursery pigs. A base linear mixed model was fitted to explore the functional form of G:F relative to Trp:Lys ratios and assess model assumptions. Next, we fitted 3 competing dose-response mixed models to G:F, namely a quadratic polynomial (QP) model, a broken-line linear (BLL) ascending model, and a broken-line quadratic (BLQ) ascending model, all of which included heteroskedastic specifications, as dictated by the base model. The GLIMMIX procedure of SAS (version 9.4) was used to fit the base and QP models and the NLMIXED procedure was used to fit the BLL and BLQ models. We further illustrated the use of a grid search of initial parameter values to facilitate convergence and parameter estimation in nonlinear mixed models. Fit between competing dose-response models was compared using a maximum likelihood-based Bayesian information criterion (BIC). The QP, BLL, and BLQ models fitted on G:F of nursery pigs yielded BIC values of 353.7, 343.4, and 345.2, respectively, thus indicating a better fit of the BLL model. The BLL breakpoint estimate of the SID Trp:Lys ratio was 16.5% (95% confidence interval [16.1, 17.0]). Problems with the estimation process rendered results from the BLQ model questionable. Importantly, accounting for heterogeneous variance enhanced inferential precision as the breadth of the confidence interval for the mean breakpoint decreased by approximately 44%. In summary, the article illustrates the use of linear and nonlinear mixed models for dose-response relationships accounting for heterogeneous residual variances, discusses important diagnostics and their implications for inference, and provides practical recommendations for computational troubleshooting.
Bouvet, J-M; Makouanzi, G; Cros, D; Vigneron, Ph
2016-01-01
Hybrids are broadly used in plant breeding and accurate estimation of variance components is crucial for optimizing genetic gain. Genome-wide information may be used to explore models designed to assess the extent of additive and non-additive variance and test their prediction accuracy for the genomic selection. Ten linear mixed models, involving pedigree- and marker-based relationship matrices among parents, were developed to estimate additive (A), dominance (D) and epistatic (AA, AD and DD) effects. Five complementary models, involving the gametic phase to estimate marker-based relationships among hybrid progenies, were developed to assess the same effects. The models were compared using tree height and 3303 single-nucleotide polymorphism markers from 1130 cloned individuals obtained via controlled crosses of 13 Eucalyptus urophylla females with 9 Eucalyptus grandis males. Akaike information criterion (AIC), variance ratios, asymptotic correlation matrices of estimates, goodness-of-fit, prediction accuracy and mean square error (MSE) were used for the comparisons. The variance components and variance ratios differed according to the model. Models with a parent marker-based relationship matrix performed better than those that were pedigree-based, that is, an absence of singularities, lower AIC, higher goodness-of-fit and accuracy and smaller MSE. However, AD and DD variances were estimated with high s.es. Using the same criteria, progeny gametic phase-based models performed better in fitting the observations and predicting genetic values. However, DD variance could not be separated from the dominance variance and null estimates were obtained for AA and AD effects. This study highlighted the advantages of progeny models using genome-wide information. PMID:26328760
NASA Astrophysics Data System (ADS)
Musa, Rosliza; Ali, Zalila; Baharum, Adam; Nor, Norlida Mohd
2017-08-01
The linear regression model assumes that all random error components are identically and independently distributed with constant variance. Hence, each data point provides equally precise information about the deterministic part of the total variation. In other words, the standard deviations of the error terms are constant over all values of the predictor variables. When the assumption of constant variance is violated, the ordinary least squares estimator of regression coefficient lost its property of minimum variance in the class of linear and unbiased estimators. Weighted least squares estimation are often used to maximize the efficiency of parameter estimation. A procedure that treats all of the data equally would give less precisely measured points more influence than they should have and would give highly precise points too little influence. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates. This study used polynomial model with weighted least squares estimation to investigate paddy production of different paddy lots based on paddy cultivation characteristics and environmental characteristics in the area of Kedah and Perlis. The results indicated that factors affecting paddy production are mixture fertilizer application cycle, average temperature, the squared effect of average rainfall, the squared effect of pest and disease, the interaction between acreage with amount of mixture fertilizer, the interaction between paddy variety and NPK fertilizer application cycle and the interaction between pest and disease and NPK fertilizer application cycle.
Estimation of genetic parameters for milk yield in Murrah buffaloes by Bayesian inference.
Breda, F C; Albuquerque, L G; Euclydes, R F; Bignardi, A B; Baldi, F; Torres, R A; Barbosa, L; Tonhati, H
2010-02-01
Random regression models were used to estimate genetic parameters for test-day milk yield in Murrah buffaloes using Bayesian inference. Data comprised 17,935 test-day milk records from 1,433 buffaloes. Twelve models were tested using different combinations of third-, fourth-, fifth-, sixth-, and seventh-order orthogonal polynomials of weeks of lactation for additive genetic and permanent environmental effects. All models included the fixed effects of contemporary group, number of daily milkings and age of cow at calving as covariate (linear and quadratic effect). In addition, residual variances were considered to be heterogeneous with 6 classes of variance. Models were selected based on the residual mean square error, weighted average of residual variance estimates, and estimates of variance components, heritabilities, correlations, eigenvalues, and eigenfunctions. Results indicated that changes in the order of fit for additive genetic and permanent environmental random effects influenced the estimation of genetic parameters. Heritability estimates ranged from 0.19 to 0.31. Genetic correlation estimates were close to unity between adjacent test-day records, but decreased gradually as the interval between test-days increased. Results from mean squared error and weighted averages of residual variance estimates suggested that a model considering sixth- and seventh-order Legendre polynomials for additive and permanent environmental effects, respectively, and 6 classes for residual variances, provided the best fit. Nevertheless, this model presented the largest degree of complexity. A more parsimonious model, with fourth- and sixth-order polynomials, respectively, for these same effects, yielded very similar genetic parameter estimates. Therefore, this last model is recommended for routine applications. Copyright 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Missing Data Treatments at the Second Level of Hierarchical Linear Models
ERIC Educational Resources Information Center
St. Clair, Suzanne W.
2011-01-01
The current study evaluated the performance of traditional versus modern MDTs in the estimation of fixed-effects and variance components for data missing at the second level of an hierarchical linear model (HLM) model across 24 different study conditions. Variables manipulated in the analysis included, (a) number of Level-2 variables with missing…
Age-specific survival of male golden-cheeked warblers on the Fort Hood Military Reservation, Texas
Duarte, Adam; Hines, James E.; Nichols, James D.; Hatfield, Jeffrey S.; Weckerly, Floyd W.
2014-01-01
Population models are essential components of large-scale conservation and management plans for the federally endangered Golden-cheeked Warbler (Setophaga chrysoparia; hereafter GCWA). However, existing models are based on vital rate estimates calculated using relatively small data sets that are now more than a decade old. We estimated more current, precise adult and juvenile apparent survival (Φ) probabilities and their associated variances for male GCWAs. In addition to providing estimates for use in population modeling, we tested hypotheses about spatial and temporal variation in Φ. We assessed whether a linear trend in Φ or a change in the overall mean Φ corresponded to an observed increase in GCWA abundance during 1992-2000 and if Φ varied among study plots. To accomplish these objectives, we analyzed long-term GCWA capture-resight data from 1992 through 2011, collected across seven study plots on the Fort Hood Military Reservation using a Cormack-Jolly-Seber model structure within program MARK. We also estimated Φ process and sampling variances using a variance-components approach. Our results did not provide evidence of site-specific variation in adult Φ on the installation. Because of a lack of data, we could not assess whether juvenile Φ varied spatially. We did not detect a strong temporal association between GCWA abundance and Φ. Mean estimates of Φ for adult and juvenile male GCWAs for all years analyzed were 0.47 with a process variance of 0.0120 and a sampling variance of 0.0113 and 0.28 with a process variance of 0.0076 and a sampling variance of 0.0149, respectively. Although juvenile Φ did not differ greatly from previous estimates, our adult Φ estimate suggests previous GCWA population models were overly optimistic with respect to adult survival. These updated Φ probabilities and their associated variances will be incorporated into new population models to assist with GCWA conservation decision making.
Estimation of Additive, Dominance, and Imprinting Genetic Variance Using Genomic Data
Lopes, Marcos S.; Bastiaansen, John W. M.; Janss, Luc; Knol, Egbert F.; Bovenhuis, Henk
2015-01-01
Traditionally, exploration of genetic variance in humans, plants, and livestock species has been limited mostly to the use of additive effects estimated using pedigree data. However, with the development of dense panels of single-nucleotide polymorphisms (SNPs), the exploration of genetic variation of complex traits is moving from quantifying the resemblance between family members to the dissection of genetic variation at individual loci. With SNPs, we were able to quantify the contribution of additive, dominance, and imprinting variance to the total genetic variance by using a SNP regression method. The method was validated in simulated data and applied to three traits (number of teats, backfat, and lifetime daily gain) in three purebred pig populations. In simulated data, the estimates of additive, dominance, and imprinting variance were very close to the simulated values. In real data, dominance effects account for a substantial proportion of the total genetic variance (up to 44%) for these traits in these populations. The contribution of imprinting to the total phenotypic variance of the evaluated traits was relatively small (1–3%). Our results indicate a strong relationship between additive variance explained per chromosome and chromosome length, which has been described previously for other traits in other species. We also show that a similar linear relationship exists for dominance and imprinting variance. These novel results improve our understanding of the genetic architecture of the evaluated traits and shows promise to apply the SNP regression method to other traits and species, including human diseases. PMID:26438289
NASA Astrophysics Data System (ADS)
Wayan Mangku, I.
2017-10-01
In this paper we survey some results on estimation of the intensity function of a cyclic Poisson process in the presence of additive and multiplicative linear trend. We do not assume any parametric form for the cyclic component of the intensity function, except that it is periodic. Moreover, we consider the case when there is only a single realization of the Poisson process is observed in a bounded interval. The considered estimators are weakly and strongly consistent when the size of the observation interval indefinitely expands. Asymptotic approximations to the bias and variance of those estimators are presented.
Moghaddar, N; van der Werf, J H J
2017-12-01
The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.
Kriging analysis of mean annual precipitation, Powder River Basin, Montana and Wyoming
Karlinger, M.R.; Skrivan, James A.
1981-01-01
Kriging is a statistical estimation technique for regionalized variables which exhibit an autocorrelation structure. Such structure can be described by a semi-variogram of the observed data. The kriging estimate at any point is a weighted average of the data, where the weights are determined using the semi-variogram and an assumed drift, or lack of drift, in the data. Block, or areal, estimates can also be calculated. The kriging algorithm, based on unbiased and minimum-variance estimates, involves a linear system of equations to calculate the weights. Kriging variances can then be used to give confidence intervals of the resulting estimates. Mean annual precipitation in the Powder River basin, Montana and Wyoming, is an important variable when considering restoration of coal-strip-mining lands of the region. Two kriging analyses involving data at 60 stations were made--one assuming no drift in precipitation, and one a partial quadratic drift simulating orographic effects. Contour maps of estimates of mean annual precipitation were similar for both analyses, as were the corresponding contours of kriging variances. Block estimates of mean annual precipitation were made for two subbasins. Runoff estimates were 1-2 percent of the kriged block estimates. (USGS)
Koay, Cheng Guan; Chang, Lin-Ching; Carew, John D; Pierpaoli, Carlo; Basser, Peter J
2006-09-01
A unifying theoretical and algorithmic framework for diffusion tensor estimation is presented. Theoretical connections among the least squares (LS) methods, (linear least squares (LLS), weighted linear least squares (WLLS), nonlinear least squares (NLS) and their constrained counterparts), are established through their respective objective functions, and higher order derivatives of these objective functions, i.e., Hessian matrices. These theoretical connections provide new insights in designing efficient algorithms for NLS and constrained NLS (CNLS) estimation. Here, we propose novel algorithms of full Newton-type for the NLS and CNLS estimations, which are evaluated with Monte Carlo simulations and compared with the commonly used Levenberg-Marquardt method. The proposed methods have a lower percent of relative error in estimating the trace and lower reduced chi2 value than those of the Levenberg-Marquardt method. These results also demonstrate that the accuracy of an estimate, particularly in a nonlinear estimation problem, is greatly affected by the Hessian matrix. In other words, the accuracy of a nonlinear estimation is algorithm-dependent. Further, this study shows that the noise variance in diffusion weighted signals is orientation dependent when signal-to-noise ratio (SNR) is low (
Estimation of transformation parameters for microarray data.
Durbin, Blythe; Rocke, David M
2003-07-22
Durbin et al. (2002), Huber et al. (2002) and Munson (2001) independently introduced a family of transformations (the generalized-log family) which stabilizes the variance of microarray data up to the first order. We introduce a method for estimating the transformation parameter in tandem with a linear model based on the procedure outlined in Box and Cox (1964). We also discuss means of finding transformations within the generalized-log family which are optimal under other criteria, such as minimum residual skewness and minimum mean-variance dependency. R and Matlab code and test data are available from the authors on request.
Estimating stochastic noise using in situ measurements from a linear wavefront slope sensor.
Bharmal, Nazim Ali; Reeves, Andrew P
2016-01-15
It is shown how the solenoidal component of noise from the measurements of a wavefront slope sensor can be utilized to estimate the total noise: specifically, the ensemble noise variance. It is well known that solenoidal noise is orthogonal to the reconstruction of the wavefront under conditions of low scintillation (absence of wavefront vortices). Therefore, it can be retrieved even with a nonzero slope signal present. By explicitly estimating the solenoidal noise from an ensemble of slopes, it can be retrieved for any wavefront sensor configuration. Furthermore, the ensemble variance is demonstrated to be related to the total noise variance via a straightforward relationship. This relationship is revealed via the method of the explicit estimation: it consists of a small, heuristic set of four constants that do not depend on the underlying statistics of the incoming wavefront. These constants seem to apply to all situations-data from a laboratory experiment as well as many configurations of numerical simulation-so the method is concluded to be generic.
Fischer, A; Friggens, N C; Berry, D P; Faverdin, P
2018-07-01
The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.
Use of the Internet for Health Information: United States, 2009
... as accidents or dental care. Data source and methods Data from the 2009 NHIS were used for ... sample design of NHIS. The Taylor series linearization method was chosen for variance estimation. Differences between percentages ...
Wright, George W; Simon, Richard M
2003-12-12
Microarray techniques provide a valuable way of characterizing the molecular nature of disease. Unfortunately expense and limited specimen availability often lead to studies with small sample sizes. This makes accurate estimation of variability difficult, since variance estimates made on a gene by gene basis will have few degrees of freedom, and the assumption that all genes share equal variance is unlikely to be true. We propose a model by which the within gene variances are drawn from an inverse gamma distribution, whose parameters are estimated across all genes. This results in a test statistic that is a minor variation of those used in standard linear models. We demonstrate that the model assumptions are valid on experimental data, and that the model has more power than standard tests to pick up large changes in expression, while not increasing the rate of false positives. This method is incorporated into BRB-ArrayTools version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html). ftp://linus.nci.nih.gov/pub/techreport/RVM_supplement.pdf
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression
2006-03-01
Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the
Heritability of Performance Deficit Accumulation During Acute Sleep Deprivation in Twins
Kuna, Samuel T.; Maislin, Greg; Pack, Frances M.; Staley, Bethany; Hachadoorian, Robert; Coccaro, Emil F.; Pack, Allan I.
2012-01-01
Study Objectives: To determine if the large and highly reproducible interindividual differences in rates of performance deficit accumulation during sleep deprivation, as determined by the number of lapses on a sustained reaction time test, the Psychomotor Vigilance Task (PVT), arise from a heritable trait. Design: Prospective, observational cohort study. Setting: Academic medical center. Participants: There were 59 monozygotic (mean age 29.2 ± 6.8 [SD] yr; 15 male and 44 female pairs) and 41 dizygotic (mean age 26.6 ± 7.6 yr; 15 male and 26 female pairs) same-sex twin pairs with a normal polysomnogram. Interventions: Thirty-eight hr of monitored, continuous sleep deprivation. Measurements and Results: Patients performed the 10-min PVT every 2 hr during the sleep deprivation protocol. The primary outcome was change from baseline in square root transformed total lapses (response time ≥ 500 ms) per trial. Patient-specific linear rates of performance deficit accumulation were separated from circadian effects using multiple linear regression. Using the classic approach to assess heritability, the intraclass correlation coefficients for accumulating deficits resulted in a broad sense heritability (h2) estimate of 0.834. The mean within-pair and among-pair heritability estimates determined by analysis of variance-based methods was 0.715. When variance components of mixed-effect multilevel models were estimated by maximum likelihood estimation and used to determine the proportions of phenotypic variance explained by genetic and nongenetic factors, 51.1% (standard error = 8.4%, P < 0.0001) of twin variance was attributed to combined additive and dominance genetic effects. Conclusion: Genetic factors explain a large fraction of interindividual variance among rates of performance deficit accumulations on PVT during sleep deprivation. Citation: Kuna ST; Maislin G; Pack FM; Staley B; Hachadoorian R; Coccaro EF; Pack AI. Heritability of performance deficit accumulation during acute sleep deprivation in twins. SLEEP 2012;35(9):1223-1233. PMID:22942500
Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal
2009-01-01
The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455
Prevalence of Obesity Among Adults and Youth: United States, 2011-2014
... sample is selected through a complex, multistage probability design. In 2011–2012 and 2013–2014, non-Hispanic ... All variance estimates accounted for the complex survey design by using Taylor series linearization. Pregnant females were ...
On estimation of linear transformation models with nested case–control sampling
Liu, Mengling
2011-01-01
Nested case–control (NCC) sampling is widely used in large epidemiological cohort studies for its cost effectiveness, but its data analysis primarily relies on the Cox proportional hazards model. In this paper, we consider a family of linear transformation models for analyzing NCC data and propose an inverse selection probability weighted estimating equation method for inference. Consistency and asymptotic normality of our estimators for regression coefficients are established. We show that the asymptotic variance has a closed analytic form and can be easily estimated. Numerical studies are conducted to support the theory and an application to the Wilms’ Tumor Study is also given to illustrate the methodology. PMID:21912975
NASA Astrophysics Data System (ADS)
Rusakov, Oleg; Laskin, Michael
2017-06-01
We consider a stochastic model of changes of prices in real estate markets. We suppose that in a book of prices the changes happen in points of jumps of a Poisson process with a random intensity, i.e. moments of changes sequently follow to a random process of the Cox process type. We calculate cumulative mathematical expectations and variances for the random intensity of this point process. In the case that the process of random intensity is a martingale the cumulative variance has a linear grows. We statistically process a number of observations of real estate prices and accept hypotheses of a linear grows for estimations as well for cumulative average, as for cumulative variance both for input and output prises that are writing in the book of prises.
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets.
Genetic control of residual variance of yearling weight in Nellore beef cattle.
Iung, L H S; Neves, H H R; Mulder, H A; Carvalheiro, R
2017-04-01
There is evidence for genetic variability in residual variance of livestock traits, which offers the potential for selection for increased uniformity of production. Different statistical approaches have been employed to study this topic; however, little is known about the concordance between them. The aim of our study was to investigate the genetic heterogeneity of residual variance on yearling weight (YW; 291.15 ± 46.67) in a Nellore beef cattle population; to compare the results of the statistical approaches, the two-step approach and the double hierarchical generalized linear model (DHGLM); and to evaluate the effectiveness of power transformation to accommodate scale differences. The comparison was based on genetic parameters, accuracy of EBV for residual variance, and cross-validation to assess predictive performance of both approaches. A total of 194,628 yearling weight records from 625 sires were used in the analysis. The results supported the hypothesis of genetic heterogeneity of residual variance on YW in Nellore beef cattle and the opportunity of selection, measured through the genetic coefficient of variation of residual variance (0.10 to 0.12 for the two-step approach and 0.17 for DHGLM, using an untransformed data set). However, low estimates of genetic variance associated with positive genetic correlations between mean and residual variance (about 0.20 for two-step and 0.76 for DHGLM for an untransformed data set) limit the genetic response to selection for uniformity of production while simultaneously increasing YW itself. Moreover, large sire families are needed to obtain accurate estimates of genetic merit for residual variance, as indicated by the low heritability estimates (<0.007). Box-Cox transformation was able to decrease the dependence of the variance on the mean and decreased the estimates of genetic parameters for residual variance. The transformation reduced but did not eliminate all the genetic heterogeneity of residual variance, highlighting its presence beyond the scale effect. The DHGLM showed higher predictive ability of EBV for residual variance and therefore should be preferred over the two-step approach.
Relating the Hadamard Variance to MCS Kalman Filter Clock Estimation
NASA Technical Reports Server (NTRS)
Hutsell, Steven T.
1996-01-01
The Global Positioning System (GPS) Master Control Station (MCS) currently makes significant use of the Allan Variance. This two-sample variance equation has proven excellent as a handy, understandable tool, both for time domain analysis of GPS cesium frequency standards, and for fine tuning the MCS's state estimation of these atomic clocks. The Allan Variance does not explicitly converge for the nose types of alpha less than or equal to minus 3 and can be greatly affected by frequency drift. Because GPS rubidium frequency standards exhibit non-trivial aging and aging noise characteristics, the basic Allan Variance analysis must be augmented in order to (a) compensate for a dynamic frequency drift, and (b) characterize two additional noise types, specifically alpha = minus 3, and alpha = minus 4. As the GPS program progresses, we will utilize a larger percentage of rubidium frequency standards than ever before. Hence, GPS rubidium clock characterization will require more attention than ever before. The three sample variance, commonly referred to as a renormalized Hadamard Variance, is unaffected by linear frequency drift, converges for alpha is greater than minus 5, and thus has utility for modeling noise in GPS rubidium frequency standards. This paper demonstrates the potential of Hadamard Variance analysis in GPS operations, and presents an equation that relates the Hadamard Variance to the MCS's Kalman filter process noises.
Multi-Sensor Optimal Data Fusion Based on the Adaptive Fading Unscented Kalman Filter
Gao, Bingbing; Hu, Gaoge; Gao, Shesheng; Gu, Chengfan
2018-01-01
This paper presents a new optimal data fusion methodology based on the adaptive fading unscented Kalman filter for multi-sensor nonlinear stochastic systems. This methodology has a two-level fusion structure: at the bottom level, an adaptive fading unscented Kalman filter based on the Mahalanobis distance is developed and serves as local filters to improve the adaptability and robustness of local state estimations against process-modeling error; at the top level, an unscented transformation-based multi-sensor optimal data fusion for the case of N local filters is established according to the principle of linear minimum variance to calculate globally optimal state estimation by fusion of local estimations. The proposed methodology effectively refrains from the influence of process-modeling error on the fusion solution, leading to improved adaptability and robustness of data fusion for multi-sensor nonlinear stochastic systems. It also achieves globally optimal fusion results based on the principle of linear minimum variance. Simulation and experimental results demonstrate the efficacy of the proposed methodology for INS/GNSS/CNS (inertial navigation system/global navigation satellite system/celestial navigation system) integrated navigation. PMID:29415509
Multi-Sensor Optimal Data Fusion Based on the Adaptive Fading Unscented Kalman Filter.
Gao, Bingbing; Hu, Gaoge; Gao, Shesheng; Zhong, Yongmin; Gu, Chengfan
2018-02-06
This paper presents a new optimal data fusion methodology based on the adaptive fading unscented Kalman filter for multi-sensor nonlinear stochastic systems. This methodology has a two-level fusion structure: at the bottom level, an adaptive fading unscented Kalman filter based on the Mahalanobis distance is developed and serves as local filters to improve the adaptability and robustness of local state estimations against process-modeling error; at the top level, an unscented transformation-based multi-sensor optimal data fusion for the case of N local filters is established according to the principle of linear minimum variance to calculate globally optimal state estimation by fusion of local estimations. The proposed methodology effectively refrains from the influence of process-modeling error on the fusion solution, leading to improved adaptability and robustness of data fusion for multi-sensor nonlinear stochastic systems. It also achieves globally optimal fusion results based on the principle of linear minimum variance. Simulation and experimental results demonstrate the efficacy of the proposed methodology for INS/GNSS/CNS (inertial navigation system/global navigation satellite system/celestial navigation system) integrated navigation.
NASA Astrophysics Data System (ADS)
Kar, Soummya; Moura, José M. F.
2011-08-01
The paper considers gossip distributed estimation of a (static) distributed random field (a.k.a., large scale unknown parameter vector) observed by sparsely interconnected sensors, each of which only observes a small fraction of the field. We consider linear distributed estimators whose structure combines the information \\emph{flow} among sensors (the \\emph{consensus} term resulting from the local gossiping exchange among sensors when they are able to communicate) and the information \\emph{gathering} measured by the sensors (the \\emph{sensing} or \\emph{innovations} term.) This leads to mixed time scale algorithms--one time scale associated with the consensus and the other with the innovations. The paper establishes a distributed observability condition (global observability plus mean connectedness) under which the distributed estimates are consistent and asymptotically normal. We introduce the distributed notion equivalent to the (centralized) Fisher information rate, which is a bound on the mean square error reduction rate of any distributed estimator; we show that under the appropriate modeling and structural network communication conditions (gossip protocol) the distributed gossip estimator attains this distributed Fisher information rate, asymptotically achieving the performance of the optimal centralized estimator. Finally, we study the behavior of the distributed gossip estimator when the measurements fade (noise variance grows) with time; in particular, we consider the maximum rate at which the noise variance can grow and still the distributed estimator being consistent, by showing that, as long as the centralized estimator is consistent, the distributed estimator remains consistent.
Multi-objective Optimization of Solar Irradiance and Variance at Pertinent Inclination Angles
NASA Astrophysics Data System (ADS)
Jain, Dhanesh; Lalwani, Mahendra
2018-05-01
The performance of photovoltaic panel gets highly affected bychange in atmospheric conditions and angle of inclination. This article evaluates the optimum tilt angle and orientation angle (surface azimuth angle) for solar photovoltaic array in order to get maximum solar irradiance and to reduce variance of radiation at different sets or subsets of time periods. Non-linear regression and adaptive neural fuzzy interference system (ANFIS) methods are used for predicting the solar radiation. The results of ANFIS are more accurate in comparison to non-linear regression. These results are further used for evaluating the correlation and applied for estimating the optimum combination of tilt angle and orientation angle with the help of general algebraic modelling system and multi-objective genetic algorithm. The hourly average solar irradiation is calculated at different combinations of tilt angle and orientation angle with the help of horizontal surface radiation data of Jodhpur (Rajasthan, India). The hourly average solar irradiance is calculated for three cases: zero variance, with actual variance and with double variance at different time scenarios. It is concluded that monthly collected solar radiation produces better result as compared to bimonthly, seasonally, half-yearly and yearly collected solar radiation. The profit obtained for monthly varying angle has 4.6% more with zero variance and 3.8% more with actual variance, than the annually fixed angle.
NASA Astrophysics Data System (ADS)
Sikora, Grzegorz; Teuerle, Marek; Wyłomańska, Agnieszka; Grebenkov, Denis
2017-08-01
The most common way of estimating the anomalous scaling exponent from single-particle trajectories consists of a linear fit of the dependence of the time-averaged mean-square displacement on the lag time at the log-log scale. We investigate the statistical properties of this estimator in the case of fractional Brownian motion (FBM). We determine the mean value, the variance, and the distribution of the estimator. Our theoretical results are confirmed by Monte Carlo simulations. In the limit of long trajectories, the estimator is shown to be asymptotically unbiased, consistent, and with vanishing variance. These properties ensure an accurate estimation of the scaling exponent even from a single (long enough) trajectory. As a consequence, we prove that the usual way to estimate the diffusion exponent of FBM is correct from the statistical point of view. Moreover, the knowledge of the estimator distribution is the first step toward new statistical tests of FBM and toward a more reliable interpretation of the experimental histograms of scaling exponents in microbiology.
ERIC Educational Resources Information Center
Carlson, James E.
2014-01-01
Many aspects of the geometry of linear statistical models and least squares estimation are well known. Discussions of the geometry may be found in many sources. Some aspects of the geometry relating to the partitioning of variation that can be explained using a little-known theorem of Pappus and have not been discussed previously are the topic of…
Likelihood Ratio Tests for Relationships between Two Covariance Matrices.
1982-11-01
mk+l+...+mp)/(p-k). Then, using the results on the asymptotic distribution of the functions of the roots mk+l,...,m p (see Fang and Krishnaiah , 1982...variances and co- variances of the variables in (6.1) is k , :i, 12 E( i- l,...,k (6.2) A B... B D (X) B A ... B (6.3) Krishnaiah and Lee (1974) and...P,R. Krishnaiah for reading the manuscript and making useful comments. 7. REFERENCES [1] Anderson, T.W. (1951). Estimating linear restrictions on
Predicting vertical jump height from bar velocity.
García-Ramos, Amador; Štirn, Igor; Padial, Paulino; Argüelles-Cienfuegos, Javier; De la Fuente, Blanca; Strojnik, Vojko; Feriche, Belén
2015-06-01
The objective of the study was to assess the use of maximum (Vmax) and final propulsive phase (FPV) bar velocity to predict jump height in the weighted jump squat. FPV was defined as the velocity reached just before bar acceleration was lower than gravity (-9.81 m·s(-2)). Vertical jump height was calculated from the take-off velocity (Vtake-off) provided by a force platform. Thirty swimmers belonging to the National Slovenian swimming team performed a jump squat incremental loading test, lifting 25%, 50%, 75% and 100% of body weight in a Smith machine. Jump performance was simultaneously monitored using an AMTI portable force platform and a linear velocity transducer attached to the barbell. Simple linear regression was used to estimate jump height from the Vmax and FPV recorded by the linear velocity transducer. Vmax (y = 16.577x - 16.384) was able to explain 93% of jump height variance with a standard error of the estimate of 1.47 cm. FPV (y = 12.828x - 6.504) was able to explain 91% of jump height variance with a standard error of the estimate of 1.66 cm. Despite that both variables resulted to be good predictors, heteroscedasticity in the differences between FPV and Vtake-off was observed (r(2) = 0.307), while the differences between Vmax and Vtake-off were homogenously distributed (r(2) = 0.071). These results suggest that Vmax is a valid tool for estimating vertical jump height in a loaded jump squat test performed in a Smith machine. Key pointsVertical jump height in the loaded jump squat can be estimated with acceptable precision from the maximum bar velocity recorded by a linear velocity transducer.The relationship between the point at which bar acceleration is less than -9.81 m·s(-2) and the real take-off is affected by the velocity of movement.Mean propulsive velocity recorded by a linear velocity transducer does not appear to be optimal to monitor ballistic exercise performance.
Predicting Vertical Jump Height from Bar Velocity
García-Ramos, Amador; Štirn, Igor; Padial, Paulino; Argüelles-Cienfuegos, Javier; De la Fuente, Blanca; Strojnik, Vojko; Feriche, Belén
2015-01-01
The objective of the study was to assess the use of maximum (Vmax) and final propulsive phase (FPV) bar velocity to predict jump height in the weighted jump squat. FPV was defined as the velocity reached just before bar acceleration was lower than gravity (-9.81 m·s-2). Vertical jump height was calculated from the take-off velocity (Vtake-off) provided by a force platform. Thirty swimmers belonging to the National Slovenian swimming team performed a jump squat incremental loading test, lifting 25%, 50%, 75% and 100% of body weight in a Smith machine. Jump performance was simultaneously monitored using an AMTI portable force platform and a linear velocity transducer attached to the barbell. Simple linear regression was used to estimate jump height from the Vmax and FPV recorded by the linear velocity transducer. Vmax (y = 16.577x - 16.384) was able to explain 93% of jump height variance with a standard error of the estimate of 1.47 cm. FPV (y = 12.828x - 6.504) was able to explain 91% of jump height variance with a standard error of the estimate of 1.66 cm. Despite that both variables resulted to be good predictors, heteroscedasticity in the differences between FPV and Vtake-off was observed (r2 = 0.307), while the differences between Vmax and Vtake-off were homogenously distributed (r2 = 0.071). These results suggest that Vmax is a valid tool for estimating vertical jump height in a loaded jump squat test performed in a Smith machine. Key points Vertical jump height in the loaded jump squat can be estimated with acceptable precision from the maximum bar velocity recorded by a linear velocity transducer. The relationship between the point at which bar acceleration is less than -9.81 m·s-2 and the real take-off is affected by the velocity of movement. Mean propulsive velocity recorded by a linear velocity transducer does not appear to be optimal to monitor ballistic exercise performance. PMID:25983572
Does Marriage Moderate Genetic Effects on Delinquency and Violence?
Li, Yi; Liu, Hexuan; Guo, Guang
2015-01-01
Using data from the National Longitudinal Study of Adolescent to Adult Health (N = 1,254), the authors investigated whether marriage can foster desistance from delinquency and violence by moderating genetic effects. In contrast to existing gene–environment research that typically focuses on one or a few genetic polymorphisms, they extended a recently developed mixed linear model to consider the collective influence of 580 single nucleotide polymorphisms in 64 genes related to aggression and risky behavior. The mixed linear model estimates the proportion of variance in the phenotype that is explained by the single nucleotide polymorphisms. The authors found that the proportion of variance in delinquency/violence explained was smaller among married individuals than unmarried individuals. Because selection, confounding, and heterogeneity may bias the estimate of the Gene × Marriage interaction, they conducted a series of analyses to address these issues. The findings suggest that the Gene × Marriage interaction results were not seriously affected by these issues. PMID:26549892
Enhanced algorithms for stochastic programming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishna, Alamuru S.
1993-09-01
In this dissertation, we present some of the recent advances made in solving two-stage stochastic linear programming problems of large size and complexity. Decomposition and sampling are two fundamental components of techniques to solve stochastic optimization problems. We describe improvements to the current techniques in both these areas. We studied different ways of using importance sampling techniques in the context of Stochastic programming, by varying the choice of approximation functions used in this method. We have concluded that approximating the recourse function by a computationally inexpensive piecewise-linear function is highly efficient. This reduced the problem from finding the mean ofmore » a computationally expensive functions to finding that of a computationally inexpensive function. Then we implemented various variance reduction techniques to estimate the mean of a piecewise-linear function. This method achieved similar variance reductions in orders of magnitude less time than, when we directly applied variance-reduction techniques directly on the given problem. In solving a stochastic linear program, the expected value problem is usually solved before a stochastic solution and also to speed-up the algorithm by making use of the information obtained from the solution of the expected value problem. We have devised a new decomposition scheme to improve the convergence of this algorithm.« less
NASA Astrophysics Data System (ADS)
Amiri-Simkooei, A. R.
2018-01-01
Three-dimensional (3D) coordinate transformations, generally consisting of origin shifts, axes rotations, scale changes, and skew parameters, are widely used in many geomatics applications. Although in some geodetic applications simplified transformation models are used based on the assumption of small transformation parameters, in other fields of applications such parameters are indeed large. The algorithms of two recent papers on the weighted total least-squares (WTLS) problem are used for the 3D coordinate transformation. The methodology can be applied to the case when the transformation parameters are generally large of which no approximate values of the parameters are required. Direct linearization of the rotation and scale parameters is thus not required. The WTLS formulation is employed to take into consideration errors in both the start and target systems on the estimation of the transformation parameters. Two of the well-known 3D transformation methods, namely affine (12, 9, and 8 parameters) and similarity (7 and 6 parameters) transformations, can be handled using the WTLS theory subject to hard constraints. Because the method can be formulated by the standard least-squares theory with constraints, the covariance matrix of the transformation parameters can directly be provided. The above characteristics of the 3D coordinate transformation are implemented in the presence of different variance components, which are estimated using the least squares variance component estimation. In particular, the estimability of the variance components is investigated. The efficacy of the proposed formulation is verified on two real data sets.
NASA Astrophysics Data System (ADS)
Mel, Riccardo; Viero, Daniele Pietro; Carniello, Luca; Defina, Andrea; D'Alpaos, Luigi
2014-09-01
Providing reliable and accurate storm surge forecasts is important for a wide range of problems related to coastal environments. In order to adequately support decision-making processes, it also become increasingly important to be able to estimate the uncertainty associated with the storm surge forecast. The procedure commonly adopted to do this uses the results of a hydrodynamic model forced by a set of different meteorological forecasts; however, this approach requires a considerable, if not prohibitive, computational cost for real-time application. In the present paper we present two simplified methods for estimating the uncertainty affecting storm surge prediction with moderate computational effort. In the first approach we use a computationally fast, statistical tidal model instead of a hydrodynamic numerical model to estimate storm surge uncertainty. The second approach is based on the observation that the uncertainty in the sea level forecast mainly stems from the uncertainty affecting the meteorological fields; this has led to the idea to estimate forecast uncertainty via a linear combination of suitable meteorological variances, directly extracted from the meteorological fields. The proposed methods were applied to estimate the uncertainty in the storm surge forecast in the Venice Lagoon. The results clearly show that the uncertainty estimated through a linear combination of suitable meteorological variances nicely matches the one obtained using the deterministic approach and overcomes some intrinsic limitations in the use of a statistical tidal model.
Adaptive local linear regression with application to printer color management.
Gupta, Maya R; Garcia, Eric K; Chin, Erika
2008-06-01
Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global "optimal" value, chosen by cross validation. This paper proposes adapting the number of neighbors used for estimation to the local geometry of the data, without need for cross validation. The term enclosing neighborhood is introduced to describe a set of neighbors whose convex hull contains the test point when possible. It is proven that enclosing neighborhoods yield bounded estimation variance under some assumptions. Three such enclosing neighborhood definitions are presented: natural neighbors, natural neighbors inclusive, and enclosing k-NN. The effectiveness of these neighborhood definitions with local linear regression is tested for estimating lookup tables for color management. Significant improvements in error metrics are shown, indicating that enclosing neighborhoods may be a promising adaptive neighborhood definition for other local learning tasks as well, depending on the density of training samples.
Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G
2009-09-01
The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
NASA Technical Reports Server (NTRS)
Nelson, Ross; Margolis, Hank; Montesano, Paul; Sun, Guoqing; Cook, Bruce; Corp, Larry; Andersen, Hans-Erik; DeJong, Ben; Pellat, Fernando Paz; Fickel, Thaddeus;
2016-01-01
Existing national forest inventory plots, an airborne lidar scanning (ALS) system, and a space profiling lidar system (ICESat-GLAS) are used to generate circa 2005 estimates of total aboveground dry biomass (AGB) in forest strata, by state, in the continental United States (CONUS) and Mexico. The airborne lidar is used to link ground observations of AGB to space lidar measurements. Two sets of models are generated, the first relating ground estimates of AGB to airborne laser scanning (ALS) measurements and the second set relating ALS estimates of AGB (generated using the first model set) to GLAS measurements. GLAS then, is used as a sampling tool within a hybrid estimation framework to generate stratum-, state-, and national-level AGB estimates. A two-phase variance estimator is employed to quantify GLAS sampling variability and, additively, ALS-GLAS model variability in this current, three-phase (ground-ALS-space lidar) study. The model variance component characterizes the variability of the regression coefficients used to predict ALS-based estimates of biomass as a function of GLAS measurements. Three different types of predictive models are considered in CONUS to determine which produced biomass totals closest to ground-based national forest inventory estimates - (1) linear (LIN), (2) linear-no-intercept (LNI), and (3) log-linear. For CONUS at the national level, the GLAS LNI model estimate (23.95 +/- 0.45 Gt AGB), agreed most closely with the US national forest inventory ground estimate, 24.17 +/- 0.06 Gt, i.e., within 1%. The national biomass total based on linear ground-ALS and ALS-GLAS models (25.87 +/- 0.49 Gt) overestimated the national ground-based estimate by 7.5%. The comparable log-linear model result (63.29 +/-1.36 Gt) overestimated ground results by 261%. All three national biomass GLAS estimates, LIN, LNI, and log-linear, are based on 241,718 pulses collected on 230 orbits. The US national forest inventory (ground) estimates are based on 119,414 ground plots. At the US state level, the average absolute value of the deviation of LNI GLAS estimates from the comparable ground estimate of total biomass was 18.8% (range: Oregon,-40.8% to North Dakota, 128.6%). Log-linear models produced gross overestimates in the continental US, i.e., N2.6x, and the use of this model to predict regional biomass using GLAS data in temperate, western hemisphere forests is not appropriate. The best model form, LNI, is used to produce biomass estimates in Mexico. The average biomass density in Mexican forests is 53.10 +/- 0.88 t/ha, and the total biomass for the country, given a total forest area of 688,096 sq km, is 3.65 +/- 0.06 Gt. In Mexico, our GLAS biomass total underestimated a 2005 FAO estimate (4.152 Gt) by 12% and overestimated a 2007/8 radar study's figure (3.06 Gt) by 19%.
2011-01-01
Background Many nursing and health related research studies have continuous outcome measures that are inherently non-normal in distribution. The Box-Cox transformation provides a powerful tool for developing a parsimonious model for data representation and interpretation when the distribution of the dependent variable, or outcome measure, of interest deviates from the normal distribution. The objectives of this study was to contrast the effect of obtaining the Box-Cox power transformation parameter and subsequent analysis of variance with or without a priori knowledge of predictor variables under the classic linear or linear mixed model settings. Methods Simulation data from a 3 × 4 factorial treatments design, along with the Patient Falls and Patient Injury Falls from the National Database of Nursing Quality Indicators (NDNQI®) for the 3rd quarter of 2007 from a convenience sample of over one thousand US hospitals were analyzed. The effect of the nonlinear monotonic transformation was contrasted in two ways: a) estimating the transformation parameter along with factors with potential structural effects, and b) estimating the transformation parameter first and then conducting analysis of variance for the structural effect. Results Linear model ANOVA with Monte Carlo simulation and mixed models with correlated error terms with NDNQI examples showed no substantial differences on statistical tests for structural effects if the factors with structural effects were omitted during the estimation of the transformation parameter. Conclusions The Box-Cox power transformation can still be an effective tool for validating statistical inferences with large observational, cross-sectional, and hierarchical or repeated measure studies under the linear or the mixed model settings without prior knowledge of all the factors with potential structural effects. PMID:21854614
Hou, Qingjiang; Mahnken, Jonathan D; Gajewski, Byron J; Dunton, Nancy
2011-08-19
Many nursing and health related research studies have continuous outcome measures that are inherently non-normal in distribution. The Box-Cox transformation provides a powerful tool for developing a parsimonious model for data representation and interpretation when the distribution of the dependent variable, or outcome measure, of interest deviates from the normal distribution. The objectives of this study was to contrast the effect of obtaining the Box-Cox power transformation parameter and subsequent analysis of variance with or without a priori knowledge of predictor variables under the classic linear or linear mixed model settings. Simulation data from a 3 × 4 factorial treatments design, along with the Patient Falls and Patient Injury Falls from the National Database of Nursing Quality Indicators (NDNQI® for the 3rd quarter of 2007 from a convenience sample of over one thousand US hospitals were analyzed. The effect of the nonlinear monotonic transformation was contrasted in two ways: a) estimating the transformation parameter along with factors with potential structural effects, and b) estimating the transformation parameter first and then conducting analysis of variance for the structural effect. Linear model ANOVA with Monte Carlo simulation and mixed models with correlated error terms with NDNQI examples showed no substantial differences on statistical tests for structural effects if the factors with structural effects were omitted during the estimation of the transformation parameter. The Box-Cox power transformation can still be an effective tool for validating statistical inferences with large observational, cross-sectional, and hierarchical or repeated measure studies under the linear or the mixed model settings without prior knowledge of all the factors with potential structural effects.
Kriss, A B; Paul, P A; Madden, L V
2012-09-01
A multilevel analysis of heterogeneity of disease incidence was conducted based on observations of Fusarium head blight (caused by Fusarium graminearum) in Ohio during the 2002-11 growing seasons. Sampling consisted of counting the number of diseased and healthy wheat spikes per 0.3 m of row at 10 sites (about 30 m apart) in a total of 67 to 159 sampled fields in 12 to 32 sampled counties per year. Incidence was then determined as the proportion of diseased spikes at each site. Spatial heterogeneity of incidence among counties, fields within counties, and sites within fields and counties was characterized by fitting a generalized linear mixed model to the data, using a complementary log-log link function, with the assumption that the disease status of spikes was binomially distributed conditional on the effects of county, field, and site. Based on the estimated variance terms, there was highly significant spatial heterogeneity among counties and among fields within counties each year; magnitude of the estimated variances was similar for counties and fields. The lowest level of heterogeneity was among sites within fields, and the site variance was either 0 or not significantly greater than 0 in 3 of the 10 years. Based on the variances, the intracluster correlation of disease status of spikes within sites indicated that spikes from the same site were somewhat more likely to share the same disease status relative to spikes from other sites, fields, or counties. The estimated best linear unbiased predictor (EBLUP) for each county was determined, showing large differences across the state in disease incidence (as represented by the link function of the estimated probability that a spike was diseased) but no consistency between years for the different counties. The effects of geographical location, corn and wheat acreage per county, and environmental conditions on the EBLUP for each county were not significant in the majority of years.
Arnason, T; Albertsdóttir, E; Fikse, W F; Eriksson, S; Sigurdsson, A
2012-02-01
The consequences of assuming a zero environmental covariance between a binary trait 'test-status' and a continuous trait on the estimates of genetic parameters by restricted maximum likelihood and Gibbs sampling and on response from genetic selection when the true environmental covariance deviates from zero were studied. Data were simulated for two traits (one that culling was based on and a continuous trait) using the following true parameters, on the underlying scale: h² = 0.4; r(A) = 0.5; r(E) = 0.5, 0.0 or -0.5. The selection on the continuous trait was applied to five subsequent generations where 25 sires and 500 dams produced 1500 offspring per generation. Mass selection was applied in the analysis of the effect on estimation of genetic parameters. Estimated breeding values were used in the study of the effect of genetic selection on response and accuracy. The culling frequency was either 0.5 or 0.8 within each generation. Each of 10 replicates included 7500 records on 'test-status' and 9600 animals in the pedigree file. Results from bivariate analysis showed unbiased estimates of variance components and genetic parameters when true r(E) = 0.0. For r(E) = 0.5, variance components (13-19% bias) and especially (50-80%) were underestimated for the continuous trait, while heritability estimates were unbiased. For r(E) = -0.5, heritability estimates of test-status were unbiased, while genetic variance and heritability of the continuous trait together with were overestimated (25-50%). The bias was larger for the higher culling frequency. Culling always reduced genetic progress from selection, but the genetic progress was found to be robust to the use of wrong parameter values of the true environmental correlation between test-status and the continuous trait. Use of a bivariate linear-linear model reduced bias in genetic evaluations, when data were subject to culling. © 2011 Blackwell Verlag GmbH.
White, Sonia L J; Szűcs, Dénes
2012-01-04
The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice.
2012-01-01
Background The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Methods Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Results Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. Conclusion In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice. PMID:22217191
Harris, Alexandre M.; DeGiorgio, Michael
2016-01-01
Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781
A Technique for Estimating the Surface Conductivity of Single Molecules
NASA Astrophysics Data System (ADS)
Bau, Haim; Arsenault, Mark; Zhao, Hui; Purohit, Prashant; Goldman, Yale
2007-11-01
When an AC electric field at 2MHz was applied across a small gap between two metal electrodes elevated above a surface, rhodamine-phalloidin-labeled actin filaments were attracted to the gap and became suspended between the two electrodes. The variance of each filament's horizontal, lateral displacement was measured as a function of electric field intensity and position along the filament. The variance significantly decreased as the electric field intensity increased. Hypothesizing that the electric field induces electroosmotic flow around the filament that, in turn, induces drag on the filament, which appears as effective tension, we estimated the tension using a linear, Brownian dynamic model. Based on the tension, we estimated the filament's surface conductivity. Our experimental method provides a novel means for trapping and manipulating biological filaments and for probing the surface conductance and mechanical properties of single polymers.
ERIC Educational Resources Information Center
Linacre, John Michael
Various methods of estimating main effects from ordinal data are presented and contrasted. Problems discussed include: (1) at what level to accumulate ordinal data into linear measures; (2) how to maintain scaling across analyses; and (3) the inevitable confounding of within cell variance with measurement error. An example shows three methods of…
On Latent Change Model Choice in Longitudinal Studies
ERIC Educational Resources Information Center
Raykov, Tenko; Zajacova, Anna
2012-01-01
An interval estimation procedure for proportion of explained observed variance in latent curve analysis is discussed, which can be used as an aid in the process of choosing between linear and nonlinear models. The method allows obtaining confidence intervals for the R[squared] indexes associated with repeatedly followed measures in longitudinal…
Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.
2005-01-01
Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes.
NASA Astrophysics Data System (ADS)
Schaffrin, Burkhard
2008-02-01
In a linear Gauss-Markov model, the parameter estimates from BLUUE (Best Linear Uniformly Unbiased Estimate) are not robust against possible outliers in the observations. Moreover, by giving up the unbiasedness constraint, the mean squared error (MSE) risk may be further reduced, in particular when the problem is ill-posed. In this paper, the α-weighted S-homBLE (Best homogeneously Linear Estimate) is derived via formulas originally used for variance component estimation on the basis of the repro-BIQUUE (reproducing Best Invariant Quadratic Uniformly Unbiased Estimate) principle in a model with stochastic prior information. In the present model, however, such prior information is not included, which allows the comparison of the stochastic approach (α-weighted S-homBLE) with the well-established algebraic approach of Tykhonov-Phillips regularization, also known as R-HAPS (Hybrid APproximation Solution), whenever the inverse of the “substitute matrix” S exists and is chosen as the R matrix that defines the relative impact of the regularizing term on the final result.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Yuyang; Zhang, Qichun; Wang, Hong
To enhance the performance of the tracking property , this paper presents a novel control algorithm for a class of linear dynamic stochastic systems with unmeasurable states, where the performance enhancement loop is established based on Kalman filter. Without changing the existing closed loop with the PI controller, the compensative controller is designed to minimize the variances of the tracking errors using the estimated states and the propagation of state variances. Moreover, the stability of the closed-loop systems has been analyzed in the mean-square sense. A simulated example is included to show the effectiveness of the presented control algorithm, wheremore » encouraging results have been obtained.« less
Estimation of the linear mixed integrated Ornstein–Uhlenbeck model
Hughes, Rachael A.; Kenward, Michael G.; Sterne, Jonathan A. C.; Tilling, Kate
2017-01-01
ABSTRACT The linear mixed model with an added integrated Ornstein–Uhlenbeck (IOU) process (linear mixed IOU model) allows for serial correlation and estimation of the degree of derivative tracking. It is rarely used, partly due to the lack of available software. We implemented the linear mixed IOU model in Stata and using simulations we assessed the feasibility of fitting the model by restricted maximum likelihood when applied to balanced and unbalanced data. We compared different (1) optimization algorithms, (2) parameterizations of the IOU process, (3) data structures and (4) random-effects structures. Fitting the model was practical and feasible when applied to large and moderately sized balanced datasets (20,000 and 500 observations), and large unbalanced datasets with (non-informative) dropout and intermittent missingness. Analysis of a real dataset showed that the linear mixed IOU model was a better fit to the data than the standard linear mixed model (i.e. independent within-subject errors with constant variance). PMID:28515536
Berglund, Lars; Garmo, Hans; Lindbäck, Johan; Svärdsudd, Kurt; Zethelius, Björn
2008-09-30
The least-squares estimator of the slope in a simple linear regression model is biased towards zero when the predictor is measured with random error. A corrected slope may be estimated by adding data from a reliability study, which comprises a subset of subjects from the main study. The precision of this corrected slope depends on the design of the reliability study and estimator choice. Previous work has assumed that the reliability study constitutes a random sample from the main study. A more efficient design is to use subjects with extreme values on their first measurement. Previously, we published a variance formula for the corrected slope, when the correction factor is the slope in the regression of the second measurement on the first. In this paper we show that both designs improve by maximum likelihood estimation (MLE). The precision gain is explained by the inclusion of data from all subjects for estimation of the predictor's variance and by the use of the second measurement for estimation of the covariance between response and predictor. The gain of MLE enhances with stronger true relationship between response and predictor and with lower precision in the predictor measurements. We present a real data example on the relationship between fasting insulin, a surrogate marker, and true insulin sensitivity measured by a gold-standard euglycaemic insulin clamp, and simulations, where the behavior of profile-likelihood-based confidence intervals is examined. MLE was shown to be a robust estimator for non-normal distributions and efficient for small sample situations. Copyright (c) 2008 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Chelton, Dudley B.; Schlax, Michael G.
1991-01-01
The sampling error of an arbitrary linear estimate of a time-averaged quantity constructed from a time series of irregularly spaced observations at a fixed located is quantified through a formalism. The method is applied to satellite observations of chlorophyll from the coastal zone color scanner. The two specific linear estimates under consideration are the composite average formed from the simple average of all observations within the averaging period and the optimal estimate formed by minimizing the mean squared error of the temporal average based on all the observations in the time series. The resulting suboptimal estimates are shown to be more accurate than composite averages. Suboptimal estimates are also found to be nearly as accurate as optimal estimates using the correct signal and measurement error variances and correlation functions for realistic ranges of these parameters, which makes it a viable practical alternative to the composite average method generally employed at present.
Alternatives to Multilevel Modeling for the Analysis of Clustered Data
ERIC Educational Resources Information Center
Huang, Francis L.
2016-01-01
Multilevel modeling has grown in use over the years as a way to deal with the nonindependent nature of observations found in clustered data. However, other alternatives to multilevel modeling are available that can account for observations nested within clusters, including the use of Taylor series linearization for variance estimation, the design…
Post-Modeling Histogram Matching of Maps Produced Using Regression Trees
Andrew J. Lister; Tonya W. Lister
2006-01-01
Spatial predictive models often use statistical techniques that in some way rely on averaging of values. Estimates from linear modeling are known to be susceptible to truncation of variance when the independent (predictor) variables are measured with error. A straightforward post-processing technique (histogram matching) for attempting to mitigate this effect is...
Estimation of value at risk in currency exchange rate portfolio using asymmetric GJR-GARCH Copula
NASA Astrophysics Data System (ADS)
Nurrahmat, Mohamad Husein; Noviyanti, Lienda; Bachrudin, Achmad
2017-03-01
In this study, we discuss the problem in measuring the risk in a portfolio based on value at risk (VaR) using asymmetric GJR-GARCH Copula. The approach based on the consideration that the assumption of normality over time for the return can not be fulfilled, and there is non-linear correlation for dependent model structure among the variables that lead to the estimated VaR be inaccurate. Moreover, the leverage effect also causes the asymmetric effect of dynamic variance and shows the weakness of the GARCH models due to its symmetrical effect on conditional variance. Asymmetric GJR-GARCH models are used to filter the margins while the Copulas are used to link them together into a multivariate distribution. Then, we use copulas to construct flexible multivariate distributions with different marginal and dependence structure, which is led to portfolio joint distribution does not depend on the assumptions of normality and linear correlation. VaR obtained by the analysis with confidence level 95% is 0.005586. This VaR derived from the best Copula model, t-student Copula with marginal distribution of t distribution.
Kliegl, Reinhold; Wei, Ping; Dambacher, Michael; Yan, Ming; Zhou, Xiaolin
2011-01-01
Linear mixed models (LMMs) provide a still underused methodological perspective on combining experimental and individual-differences research. Here we illustrate this approach with two-rectangle cueing in visual attention (Egly et al., 1994). We replicated previous experimental cue-validity effects relating to a spatial shift of attention within an object (spatial effect), to attention switch between objects (object effect), and to the attraction of attention toward the display centroid (attraction effect), also taking into account the design-inherent imbalance of valid and other trials. We simultaneously estimated variance/covariance components of subject-related random effects for these spatial, object, and attraction effects in addition to their mean reaction times (RTs). The spatial effect showed a strong positive correlation with mean RT and a strong negative correlation with the attraction effect. The analysis of individual differences suggests that slow subjects engage attention more strongly at the cued location than fast subjects. We compare this joint LMM analysis of experimental effects and associated subject-related variances and correlations with two frequently used alternative statistical procedures. PMID:21833292
Calus, Mario PL; Bijma, Piter; Veerkamp, Roel F
2004-01-01
Covariance functions have been proposed to predict breeding values and genetic (co)variances as a function of phenotypic within herd-year averages (environmental parameters) to include genotype by environment interaction. The objective of this paper was to investigate the influence of definition of environmental parameters and non-random use of sires on expected breeding values and estimated genetic variances across environments. Breeding values were simulated as a linear function of simulated herd effects. The definition of environmental parameters hardly influenced the results. In situations with random use of sires, estimated genetic correlations between the trait expressed in different environments were 0.93, 0.93 and 0.97 while simulated at 0.89 and estimated genetic variances deviated up to 30% from the simulated values. Non random use of sires, poor genetic connectedness and small herd size had a large impact on the estimated covariance functions, expected breeding values and calculated environmental parameters. Estimated genetic correlations between a trait expressed in different environments were biased upwards and breeding values were more biased when genetic connectedness became poorer and herd composition more diverse. The best possible solution at this stage is to use environmental parameters combining large numbers of animals per herd, while losing some information on genotype by environment interaction in the data. PMID:15339629
Derivation of an analytic expression for the error associated with the noise reduction rating
NASA Astrophysics Data System (ADS)
Murphy, William J.
2005-04-01
Hearing protection devices are assessed using the Real Ear Attenuation at Threshold (REAT) measurement procedure for the purpose of estimating the amount of noise reduction provided when worn by a subject. The rating number provided on the protector label is a function of the mean and standard deviation of the REAT results achieved by the test subjects. If a group of subjects have a large variance, then it follows that the certainty of the rating should be correspondingly lower. No estimate of the error of a protector's rating is given by existing standards or regulations. Propagation of errors was applied to the Noise Reduction Rating to develop an analytic expression for the hearing protector rating error term. Comparison of the analytic expression for the error to the standard deviation estimated from Monte Carlo simulation of subject attenuations yielded a linear relationship across several protector types and assumptions for the variance of the attenuations.
Landsman, V; Lou, W Y W; Graubard, B I
2015-05-20
We present a two-step approach for estimating hazard rates and, consequently, survival probabilities, by levels of general categorical exposure. The resulting estimator utilizes three sources of data: vital statistics data and census data are used at the first step to estimate the overall hazard rate for a given combination of gender and age group, and cohort data constructed from a nationally representative complex survey with linked mortality records, are used at the second step to divide the overall hazard rate by exposure levels. We present an explicit expression for the resulting estimator and consider two methods for variance estimation that account for complex multistage sample design: (1) the leaving-one-out jackknife method, and (2) the Taylor linearization method, which provides an analytic formula for the variance estimator. The methods are illustrated with smoking and all-cause mortality data from the US National Health Interview Survey Linked Mortality Files, and the proposed estimator is compared with a previously studied crude hazard rate estimator that uses survey data only. The advantages of a two-step approach and possible extensions of the proposed estimator are discussed. Copyright © 2015 John Wiley & Sons, Ltd.
Power spectrum estimation from peculiar velocity catalogues
NASA Astrophysics Data System (ADS)
Macaulay, E.; Feldman, H. A.; Ferreira, P. G.; Jaffe, A. H.; Agarwal, S.; Hudson, M. J.; Watkins, R.
2012-09-01
The peculiar velocities of galaxies are an inherently valuable cosmological probe, providing an unbiased estimate of the distribution of matter on scales much larger than the depth of the survey. Much research interest has been motivated by the high dipole moment of our local peculiar velocity field, which suggests a large-scale excess in the matter power spectrum and can appear to be in some tension with the Λ cold dark matter (ΛCDM) model. We use a composite catalogue of 4537 peculiar velocity measurements with a characteristic depth of 33 h-1 Mpc to estimate the matter power spectrum. We compare the constraints with this method, directly studying the full peculiar velocity catalogue, to results by Macaulay et al., studying minimum variance moments of the velocity field, as calculated by Feldman, Watkins & Hudson. We find good agreement with the ΛCDM model on scales of k > 0.01 h Mpc-1. We find an excess of power on scales of k < 0.01 h Mpc-1 with a 1σ uncertainty which includes the ΛCDM model. We find that the uncertainty in excess at these scales is larger than an alternative result studying only moments of the velocity field, which is due to the minimum variance weights used to calculate the moments. At small scales, we are able to clearly discriminate between linear and non-linear clustering in simulated peculiar velocity catalogues and find some evidence (although less clear) for linear clustering in the real peculiar velocity data.
Herrera, Carlos M
2012-01-01
Methods for estimating quantitative trait heritability in wild populations have been developed in recent years which take advantage of the increased availability of genetic markers to reconstruct pedigrees or estimate relatedness between individuals, but their application to real-world data is not exempt from difficulties. This chapter describes a recent marker-based technique which, by adopting a genomic scan approach and focusing on the relationship between phenotypes and genotypes at the individual level, avoids the problems inherent to marker-based estimators of relatedness. This method allows the quantification of the genetic component of phenotypic variance ("degree of genetic determination" or "heritability in the broad sense") in wild populations and is applicable whenever phenotypic trait values and multilocus data for a large number of genetic markers (e.g., amplified fragment length polymorphisms, AFLPs) are simultaneously available for a sample of individuals from the same population. The method proceeds by first identifying those markers whose variation across individuals is significantly correlated with individual phenotypic differences ("adaptive loci"). The proportion of phenotypic variance in the sample that is statistically accounted for by individual differences in adaptive loci is then estimated by fitting a linear model to the data, with trait value as the dependent variable and scores of adaptive loci as independent ones. The method can be easily extended to accommodate quantitative or qualitative information on biologically relevant features of the environment experienced by each sampled individual, in which case estimates of the environmental and genotype × environment components of phenotypic variance can also be obtained.
Robust Estimation Based on Walsh Averages for the General Linear Model.
1983-11-01
estimate of I minimizing Ip(Z ) has an influence function proportional to p(y) and its asymptotic 2 2-1 variance-covariance matrix is E(* )/(E...in particular, on the influence function h(y) and quantities appearing in the asymptotic vari- ance. Some cno-ents are made on the one- and two...for signed rank estimates. The function P2 (t) of (1.4) has derivative 2(t) = - if t < -c 0 if It < c + I if t > c. *Then the influence function is h(t
Theoretical and simulated performance for a novel frequency estimation technique
NASA Technical Reports Server (NTRS)
Crozier, Stewart N.
1993-01-01
A low complexity, open-loop, discrete-time, delay-multiply-average (DMA) technique for estimating the frequency offset for digitally modulated MPSK signals is investigated. A nonlinearity is used to remove the MPSK modulation and generate the carrier component to be extracted. Theoretical and simulated performance results are presented and compared to the Cramer-Rao lower bound (CRLB) for the variance of the frequency estimation error. For all signal-to-noise ratios (SNR's) above threshold, it is shown that the CRLB can essentially be achieved with linear complexity.
Fernández, E N; Legarra, A; Martínez, R; Sánchez, J P; Baselga, M
2017-06-01
Inbreeding generates covariances between additive and dominance effects (breeding values and dominance deviations). In this work, we developed and applied models for estimation of dominance and additive genetic variances and their covariance, a model that we call "full dominance," from pedigree and phenotypic data. Estimates with this model such as presented here are very scarce both in livestock and in wild genetics. First, we estimated pedigree-based condensed probabilities of identity using recursion. Second, we developed an equivalent linear model in which variance components can be estimated using closed-form algorithms such as REML or Gibbs sampling and existing software. Third, we present a new method to refer the estimated variance components to meaningful parameters in a particular population, i.e., final partially inbred generations as opposed to outbred base populations. We applied these developments to three closed rabbit lines (A, V and H) selected for number of weaned at the Polytechnic University of Valencia. Pedigree and phenotypes are complete and span 43, 39 and 14 generations, respectively. Estimates of broad-sense heritability are 0.07, 0.07 and 0.05 at the base versus 0.07, 0.07 and 0.09 in the final generations. Narrow-sense heritability estimates are 0.06, 0.06 and 0.02 at the base versus 0.04, 0.04 and 0.01 at the final generations. There is also a reduction in the genotypic variance due to the negative additive-dominance correlation. Thus, the contribution of dominance variation is fairly large and increases with inbreeding and (over)compensates for the loss in additive variation. In addition, estimates of the additive-dominance correlation are -0.37, -0.31 and 0.00, in agreement with the few published estimates and theoretical considerations. © 2017 Blackwell Verlag GmbH.
Theory-Based Parameterization of Semiotics for Measuring Pre-literacy Development
NASA Astrophysics Data System (ADS)
Bezruczko, N.
2013-09-01
A probabilistic model was applied to problem of measuring pre-literacy in young children. First, semiotic philosophy and contemporary cognition research were conceptually integrated to establish theoretical foundations for rating 14 characteristics of children's drawings and narratives (N = 120). Then ratings were transformed with a Rasch model, which estimated linear item parameter values that accounted for 79 percent of rater variance. Principle Components Analysis of item residual matrix confirmed variance remaining after item calibration was largely unsystematic. Validation analyses found positive correlations between semiotic measures and preschool literacy outcomes. Practical implications of a semiotics dimension for preschool practice were discussed.
Is my study system good enough? A case study for identifying maternal effects.
Holand, Anna Marie; Steinsland, Ingelin
2016-06-01
In this paper, we demonstrate how simulation studies can be used to answer questions about identifiability and consequences of omitting effects from a model. The methodology is presented through a case study where identifiability of genetic and/or individual (environmental) maternal effects is explored. Our study system is a wild house sparrow ( Passer domesticus ) population with known pedigree. We fit pedigree-based (generalized) linear mixed models (animal models), with and without additive genetic and individual maternal effects, and use deviance information criterion (DIC) for choosing between these models. Pedigree and R-code for simulations are available. For this study system, the simulation studies show that only large maternal effects can be identified. The genetic maternal effect (and similar for individual maternal effect) has to be at least half of the total genetic variance to be identified. The consequences of omitting a maternal effect when it is present are explored. Our results indicate that the total (genetic and individual) variance are accounted for. When an individual (environmental) maternal effect is omitted from the model, this only influences the estimated (direct) individual (environmental) variance. When a genetic maternal effect is omitted from the model, both (direct) genetic and (direct) individual variance estimates are overestimated.
Adjusting for Health Status in Non-Linear Models of Health Care Disparities
Cook, Benjamin L.; McGuire, Thomas G.; Meara, Ellen; Zaslavsky, Alan M.
2009-01-01
This article compared conceptual and empirical strengths of alternative methods for estimating racial disparities using non-linear models of health care access. Three methods were presented (propensity score, rank and replace, and a combined method) that adjust for health status while allowing SES variables to mediate the relationship between race and access to care. Applying these methods to a nationally representative sample of blacks and non-Hispanic whites surveyed in the 2003 and 2004 Medical Expenditure Panel Surveys (MEPS), we assessed the concordance of each of these methods with the Institute of Medicine (IOM) definition of racial disparities, and empirically compared the methods' predicted disparity estimates, the variance of the estimates, and the sensitivity of the estimates to limitations of available data. The rank and replace and combined methods (but not the propensity score method) are concordant with the IOM definition of racial disparities in that each creates a comparison group with the appropriate marginal distributions of health status and SES variables. Predicted disparities and prediction variances were similar for the rank and replace and combined methods, but the rank and replace method was sensitive to limitations on SES information. For all methods, limiting health status information significantly reduced estimates of disparities compared to a more comprehensive dataset. We conclude that the two IOM-concordant methods were similar enough that either could be considered in disparity predictions. In datasets with limited SES information, the combined method is the better choice. PMID:20352070
Robust linear discriminant analysis with distance based estimators
NASA Astrophysics Data System (ADS)
Lim, Yai-Fung; Yahaya, Sharipah Soaad Syed; Ali, Hazlina
2017-11-01
Linear discriminant analysis (LDA) is one of the supervised classification techniques concerning relationship between a categorical variable and a set of continuous variables. The main objective of LDA is to create a function to distinguish between populations and allocating future observations to previously defined populations. Under the assumptions of normality and homoscedasticity, the LDA yields optimal linear discriminant rule (LDR) between two or more groups. However, the optimality of LDA highly relies on the sample mean and pooled sample covariance matrix which are known to be sensitive to outliers. To alleviate these conflicts, a new robust LDA using distance based estimators known as minimum variance vector (MVV) has been proposed in this study. The MVV estimators were used to substitute the classical sample mean and classical sample covariance to form a robust linear discriminant rule (RLDR). Simulation and real data study were conducted to examine on the performance of the proposed RLDR measured in terms of misclassification error rates. The computational result showed that the proposed RLDR is better than the classical LDR and was comparable with the existing robust LDR.
Two biased estimation techniques in linear regression: Application to aircraft
NASA Technical Reports Server (NTRS)
Klein, Vladislav
1988-01-01
Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.
Structured penalties for functional linear models-partially empirical eigenvectors for regression.
Randolph, Timothy W; Harezlak, Jaroslaw; Feng, Ziding
2012-01-01
One of the challenges with functional data is incorporating geometric structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates geometric structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are 'partially empirical' and the framework is provided by the generalized singular value decomposition (GSVD). The form of the penalized estimation is not new, but the GSVD clarifies the process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts.
NASA Astrophysics Data System (ADS)
Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua
2018-03-01
The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.
Chiu, Mei Choi; Pun, Chi Seng; Wong, Hoi Ying
2017-08-01
Investors interested in the global financial market must analyze financial securities internationally. Making an optimal global investment decision involves processing a huge amount of data for a high-dimensional portfolio. This article investigates the big data challenges of two mean-variance optimal portfolios: continuous-time precommitment and constant-rebalancing strategies. We show that both optimized portfolios implemented with the traditional sample estimates converge to the worst performing portfolio when the portfolio size becomes large. The crux of the problem is the estimation error accumulated from the huge dimension of stock data. We then propose a linear programming optimal (LPO) portfolio framework, which applies a constrained ℓ 1 minimization to the theoretical optimal control to mitigate the risk associated with the dimensionality issue. The resulting portfolio becomes a sparse portfolio that selects stocks with a data-driven procedure and hence offers a stable mean-variance portfolio in practice. When the number of observations becomes large, the LPO portfolio converges to the oracle optimal portfolio, which is free of estimation error, even though the number of stocks grows faster than the number of observations. Our numerical and empirical studies demonstrate the superiority of the proposed approach. © 2017 Society for Risk Analysis.
Modeling Outcomes with Floor or Ceiling Effects: An Introduction to the Tobit Model
ERIC Educational Resources Information Center
McBee, Matthew
2010-01-01
In gifted education research, it is common for outcome variables to exhibit strong floor or ceiling effects due to insufficient range of measurement of many instruments when used with gifted populations. Common statistical methods (e.g., analysis of variance, linear regression) produce biased estimates when such effects are present. In practice,…
NASA Astrophysics Data System (ADS)
Sorini, D.
2017-04-01
Measuring the clustering of galaxies from surveys allows us to estimate the power spectrum of matter density fluctuations, thus constraining cosmological models. This requires careful modelling of observational effects to avoid misinterpretation of data. In particular, signals coming from different distances encode information from different epochs. This is known as ``light-cone effect'' and is going to have a higher impact as upcoming galaxy surveys probe larger redshift ranges. Generalising the method by Feldman, Kaiser and Peacock (1994) [1], I define a minimum-variance estimator of the linear power spectrum at a fixed time, properly taking into account the light-cone effect. An analytic expression for the estimator is provided, and that is consistent with the findings of previous works in the literature. I test the method within the context of the Halofit model, assuming Planck 2014 cosmological parameters [2]. I show that the estimator presented recovers the fiducial linear power spectrum at present time within 5% accuracy up to k ~ 0.80 h Mpc-1 and within 10% up to k ~ 0.94 h Mpc-1, well into the non-linear regime of the growth of density perturbations. As such, the method could be useful in the analysis of the data from future large-scale surveys, like Euclid.
Effects of linear trends on estimation of noise in GNSS position time-series
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dmitrieva, K.; Segall, P.; Bradley, A. M.
A thorough understanding of time-dependent noise in Global Navigation Satellite System (GNSS) position time-series is necessary for computing uncertainties in any signals found in the data. However, estimation of time-correlated noise is a challenging task and is complicated by the difficulty in separating noise from signal, the features of greatest interest in the time-series. In this study, we investigate how linear trends affect the estimation of noise in daily GNSS position time-series. We use synthetic time-series to study the relationship between linear trends and estimates of time-correlated noise for the six most commonly cited noise models. We find that themore » effects of added linear trends, or conversely de-trending, vary depending on the noise model. The commonly adopted model of random walk (RW), flicker noise (FN) and white noise (WN) is the most severely affected by de-trending, with estimates of low-amplitude RW most severely biased. FN plus WN is least affected by adding or removing trends. Non-integer power-law noise estimates are also less affected by de-trending, but are very sensitive to the addition of trend when the spectral index is less than one. We derive an analytical relationship between linear trends and the estimated RW variance for the special case of pure RW noise. Finally, overall, we find that to ascertain the correct noise model for GNSS position time-series and to estimate the correct noise parameters, it is important to have independent constraints on the actual trends in the data.« less
Effects of linear trends on estimation of noise in GNSS position time-series
NASA Astrophysics Data System (ADS)
Dmitrieva, K.; Segall, P.; Bradley, A. M.
2017-01-01
A thorough understanding of time-dependent noise in Global Navigation Satellite System (GNSS) position time-series is necessary for computing uncertainties in any signals found in the data. However, estimation of time-correlated noise is a challenging task and is complicated by the difficulty in separating noise from signal, the features of greatest interest in the time-series. In this paper, we investigate how linear trends affect the estimation of noise in daily GNSS position time-series. We use synthetic time-series to study the relationship between linear trends and estimates of time-correlated noise for the six most commonly cited noise models. We find that the effects of added linear trends, or conversely de-trending, vary depending on the noise model. The commonly adopted model of random walk (RW), flicker noise (FN) and white noise (WN) is the most severely affected by de-trending, with estimates of low-amplitude RW most severely biased. FN plus WN is least affected by adding or removing trends. Non-integer power-law noise estimates are also less affected by de-trending, but are very sensitive to the addition of trend when the spectral index is less than one. We derive an analytical relationship between linear trends and the estimated RW variance for the special case of pure RW noise. Overall, we find that to ascertain the correct noise model for GNSS position time-series and to estimate the correct noise parameters, it is important to have independent constraints on the actual trends in the data.
Effects of linear trends on estimation of noise in GNSS position time-series
Dmitrieva, K.; Segall, P.; Bradley, A. M.
2016-10-20
A thorough understanding of time-dependent noise in Global Navigation Satellite System (GNSS) position time-series is necessary for computing uncertainties in any signals found in the data. However, estimation of time-correlated noise is a challenging task and is complicated by the difficulty in separating noise from signal, the features of greatest interest in the time-series. In this study, we investigate how linear trends affect the estimation of noise in daily GNSS position time-series. We use synthetic time-series to study the relationship between linear trends and estimates of time-correlated noise for the six most commonly cited noise models. We find that themore » effects of added linear trends, or conversely de-trending, vary depending on the noise model. The commonly adopted model of random walk (RW), flicker noise (FN) and white noise (WN) is the most severely affected by de-trending, with estimates of low-amplitude RW most severely biased. FN plus WN is least affected by adding or removing trends. Non-integer power-law noise estimates are also less affected by de-trending, but are very sensitive to the addition of trend when the spectral index is less than one. We derive an analytical relationship between linear trends and the estimated RW variance for the special case of pure RW noise. Finally, overall, we find that to ascertain the correct noise model for GNSS position time-series and to estimate the correct noise parameters, it is important to have independent constraints on the actual trends in the data.« less
voom: precision weights unlock linear model analysis tools for RNA-seq read counts
2014-01-01
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods. PMID:24485249
voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.
Law, Charity W; Chen, Yunshun; Shi, Wei; Smyth, Gordon K
2014-02-03
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dong, X; Petrongolo, M; Wang, T
Purpose: A general problem of dual-energy CT (DECT) is that the decomposition is sensitive to noise in the two sets of dual-energy projection data, resulting in severely degraded qualities of decomposed images. We have previously proposed an iterative denoising method for DECT. Using a linear decomposition function, the method does not gain the full benefits of DECT on beam-hardening correction. In this work, we expand the framework of our iterative method to include non-linear decomposition models for noise suppression in DECT. Methods: We first obtain decomposed projections, which are free of beam-hardening artifacts, using a lookup table pre-measured on amore » calibration phantom. First-pass material images with high noise are reconstructed from the decomposed projections using standard filter-backprojection reconstruction. Noise on the decomposed images is then suppressed by an iterative method, which is formulated in the form of least-square estimation with smoothness regularization. Based on the design principles of a best linear unbiased estimator, we include the inverse of the estimated variance-covariance matrix of the decomposed images as the penalty weight in the least-square term. Analytical formulae are derived to compute the variance-covariance matrix from the measured decomposition lookup table. Results: We have evaluated the proposed method via phantom studies. Using non-linear decomposition, our method effectively suppresses the streaking artifacts of beam-hardening and obtains more uniform images than our previous approach based on a linear model. The proposed method reduces the average noise standard deviation of two basis materials by one order of magnitude without sacrificing the spatial resolution. Conclusion: We propose a general framework of iterative denoising for material decomposition of DECT. Preliminary phantom studies have shown the proposed method improves the image uniformity and reduces noise level without resolution loss. In the future, we will perform more phantom studies to further validate the performance of the purposed method. This work is supported by a Varian MRA grant.« less
NASA Astrophysics Data System (ADS)
Indarsih, Indrati, Ch. Rini
2016-02-01
In this paper, we define variance of the fuzzy random variables through alpha level. We have a theorem that can be used to know that the variance of fuzzy random variables is a fuzzy number. We have a multi-objective linear programming (MOLP) with fuzzy random of objective function coefficients. We will solve the problem by variance approach. The approach transform the MOLP with fuzzy random of objective function coefficients into MOLP with fuzzy of objective function coefficients. By weighted methods, we have linear programming with fuzzy coefficients and we solve by simplex method for fuzzy linear programming.
Rast, Philippe; Hofer, Scott M.
2014-01-01
We investigated the power to detect variances and covariances in rates of change in the context of existing longitudinal studies using linear bivariate growth curve models. Power was estimated by means of Monte Carlo simulations. Our findings show that typical longitudinal study designs have substantial power to detect both variances and covariances among rates of change in a variety of cognitive, physical functioning, and mental health outcomes. We performed simulations to investigate the interplay among number and spacing of occasions, total duration of the study, effect size, and error variance on power and required sample size. The relation between growth rate reliability (GRR) and effect size to the sample size required to detect power ≥ .80 was non-linear, with rapidly decreasing sample sizes needed as GRR increases. The results presented here stand in contrast to previous simulation results and recommendations (Hertzog, Lindenberger, Ghisletta, & von Oertzen, 2006; Hertzog, von Oertzen, Ghisletta, & Lindenberger, 2008; von Oertzen, Ghisletta, & Lindenberger, 2010), which are limited due to confounds between study length and number of waves, error variance with GCR, and parameter values which are largely out of bounds of actual study values. Power to detect change is generally low in the early phases (i.e. first years) of longitudinal studies but can substantially increase if the design is optimized. We recommend additional assessments, including embedded intensive measurement designs, to improve power in the early phases of long-term longitudinal studies. PMID:24219544
A boosted optimal linear learner for retinal vessel segmentation
NASA Astrophysics Data System (ADS)
Poletti, E.; Grisan, E.
2014-03-01
Ocular fundus images provide important information about retinal degeneration, which may be related to acute pathologies or to early signs of systemic diseases. An automatic and quantitative assessment of vessel morphological features, such as diameters and tortuosity, can improve clinical diagnosis and evaluation of retinopathy. At variance with available methods, we propose a data-driven approach, in which the system learns a set of optimal discriminative convolution kernels (linear learner). The set is progressively built based on an ADA-boost sample weighting scheme, providing seamless integration between linear learner estimation and classification. In order to capture the vessel appearance changes at different scales, the kernels are estimated on a pyramidal decomposition of the training samples. The set is employed as a rotating bank of matched filters, whose response is used by the boosted linear classifier to provide a classification of each image pixel into the two classes of interest (vessel/background). We tested the approach fundus images available from the DRIVE dataset. We show that the segmentation performance yields an accuracy of 0.94.
Global estimation of long-term persistence in annual river runoff
NASA Astrophysics Data System (ADS)
Markonis, Y.; Moustakis, Y.; Nasika, C.; Sychova, P.; Dimitriadis, P.; Hanel, M.; Máca, P.; Papalexiou, S. M.
2018-03-01
Long-term persistence (LTP) of annual river runoff is a topic of ongoing hydrological research, due to its implications to water resources management. Here, we estimate its strength, measured by the Hurst coefficient H, in 696 annual, globally distributed, streamflow records with at least 80 years of data. We use three estimation methods (maximum likelihood estimator, Whittle estimator and least squares variance) resulting in similar mean values of H close to 0.65. Subsequently, we explore potential factors influencing H by two linear (Spearman's rank correlation, multiple linear regression) and two non-linear (self-organizing maps, random forests) techniques. Catchment area is found to be crucial for medium to larger watersheds, while climatic controls, such as aridity index, have higher impact to smaller ones. Our findings indicate that long-term persistence is weaker than found in other studies, suggesting that enhanced LTP is encountered in large-catchment rivers, were the effect of spatial aggregation is more intense. However, we also show that the estimated values of H can be reproduced by a short-term persistence stochastic model such as an auto-regressive AR(1) process. A direct consequence is that some of the most common methods for the estimation of H coefficient, might not be suitable for discriminating short- and long-term persistence even in long observational records.
Non-additive genetic variation in growth, carcass and fertility traits of beef cattle.
Bolormaa, Sunduimijid; Pryce, Jennie E; Zhang, Yuandan; Reverter, Antonio; Barendse, William; Hayes, Ben J; Goddard, Michael E
2015-04-02
A better understanding of non-additive variance could lead to increased knowledge on the genetic control and physiology of quantitative traits, and to improved prediction of the genetic value and phenotype of individuals. Genome-wide panels of single nucleotide polymorphisms (SNPs) have been mainly used to map additive effects for quantitative traits, but they can also be used to investigate non-additive effects. We estimated dominance and epistatic effects of SNPs on various traits in beef cattle and the variance explained by dominance, and quantified the increase in accuracy of phenotype prediction by including dominance deviations in its estimation. Genotype data (729 068 real or imputed SNPs) and phenotypes on up to 16 traits of 10 191 individuals from Bos taurus, Bos indicus and composite breeds were used. A genome-wide association study was performed by fitting the additive and dominance effects of single SNPs. The dominance variance was estimated by fitting a dominance relationship matrix constructed from the 729 068 SNPs. The accuracy of predicted phenotypic values was evaluated by best linear unbiased prediction using the additive and dominance relationship matrices. Epistatic interactions (additive × additive) were tested between each of the 28 SNPs that are known to have additive effects on multiple traits, and each of the other remaining 729 067 SNPs. The number of significant dominance effects was greater than expected by chance and most of them were in the direction that is presumed to increase fitness and in the opposite direction to inbreeding depression. Estimates of dominance variance explained by SNPs varied widely between traits, but had large standard errors. The median dominance variance across the 16 traits was equal to 5% of the phenotypic variance. Including a dominance deviation in the prediction did not significantly increase its accuracy for any of the phenotypes. The number of additive × additive epistatic effects that were statistically significant was greater than expected by chance. Significant dominance and epistatic effects occur for growth, carcass and fertility traits in beef cattle but they are difficult to estimate precisely and including them in phenotype prediction does not increase its accuracy.
Reliable two-dimensional phase unwrapping method using region growing and local linear estimation.
Zhou, Kun; Zaitsev, Maxim; Bao, Shanglian
2009-10-01
In MRI, phase maps can provide useful information about parameters such as field inhomogeneity, velocity of blood flow, and the chemical shift between water and fat. As phase is defined in the (-pi,pi] range, however, phase wraps often occur, which complicates image analysis and interpretation. This work presents a two-dimensional phase unwrapping algorithm that uses quality-guided region growing and local linear estimation. The quality map employs the variance of the second-order partial derivatives of the phase as the quality criterion. Phase information from unwrapped neighboring pixels is used to predict the correct phase of the current pixel using a linear regression method. The algorithm was tested on both simulated and real data, and is shown to successfully unwrap phase images that are corrupted by noise and have rapidly changing phase. (c) 2009 Wiley-Liss, Inc.
A Sparse Matrix Approach for Simultaneous Quantification of Nystagmus and Saccade
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.; Stone, Lee; Boyle, Richard D.
2012-01-01
The vestibulo-ocular reflex (VOR) consists of two intermingled non-linear subsystems; namely, nystagmus and saccade. Typically, nystagmus is analysed using a single sufficiently long signal or a concatenation of them. Saccade information is not analysed and discarded due to insufficient data length to provide consistent and minimum variance estimates. This paper presents a novel sparse matrix approach to system identification of the VOR. It allows for the simultaneous estimation of both nystagmus and saccade signals. We show via simulation of the VOR that our technique provides consistent and unbiased estimates in the presence of output additive noise.
González-López, Antonio; Vera-Sánchez, Juan Antonio; Ruiz-Morales, Carmen
2016-05-01
This note studies the statistical relationships between color channels in radiochromic film readings with flatbed scanners. The same relationships are studied for noise. Finally, their implications for multichannel film dosimetry are discussed. Radiochromic films exposed to wedged fields of 6 MV energy were read in a flatbed scanner. The joint histograms of pairs of color channels were used to obtain the joint and conditional probability density functions between channels. Then, the conditional expectations and variances of one channel given another channel were obtained. Noise was extracted from film readings by means of a multiresolution analysis. Two different dose ranges were analyzed, the first one ranging from 112 to 473 cGy and the second one from 52 to 1290 cGy. For the smallest dose range, the conditional expectations of one channel given another channel can be approximated by linear functions, while the conditional variances are fairly constant. The slopes of the linear relationships between channels can be used to simplify the expression that estimates the dose by means of the multichannel method. The slopes of the linear relationships between each channel and the red one can also be interpreted as weights in the final contribution to dose estimation. However, for the largest dose range, the conditional expectations of one channel given another channel are no longer linear functions. Finally, noises in different channels were found to correlate weakly. Signals present in different channels of radiochromic film readings show a strong statistical dependence. By contrast, noise correlates weakly between channels. For the smallest dose range analyzed, the linear behavior between the conditional expectation of one channel given another channel can be used to simplify calculations in multichannel film dosimetry.
DOE Office of Scientific and Technical Information (OSTI.GOV)
González-López, Antonio, E-mail: antonio.gonzalez7@carm.es; Vera-Sánchez, Juan Antonio; Ruiz-Morales, Carmen
Purpose: This note studies the statistical relationships between color channels in radiochromic film readings with flatbed scanners. The same relationships are studied for noise. Finally, their implications for multichannel film dosimetry are discussed. Methods: Radiochromic films exposed to wedged fields of 6 MV energy were read in a flatbed scanner. The joint histograms of pairs of color channels were used to obtain the joint and conditional probability density functions between channels. Then, the conditional expectations and variances of one channel given another channel were obtained. Noise was extracted from film readings by means of a multiresolution analysis. Two different dosemore » ranges were analyzed, the first one ranging from 112 to 473 cGy and the second one from 52 to 1290 cGy. Results: For the smallest dose range, the conditional expectations of one channel given another channel can be approximated by linear functions, while the conditional variances are fairly constant. The slopes of the linear relationships between channels can be used to simplify the expression that estimates the dose by means of the multichannel method. The slopes of the linear relationships between each channel and the red one can also be interpreted as weights in the final contribution to dose estimation. However, for the largest dose range, the conditional expectations of one channel given another channel are no longer linear functions. Finally, noises in different channels were found to correlate weakly. Conclusions: Signals present in different channels of radiochromic film readings show a strong statistical dependence. By contrast, noise correlates weakly between channels. For the smallest dose range analyzed, the linear behavior between the conditional expectation of one channel given another channel can be used to simplify calculations in multichannel film dosimetry.« less
Noise and drift analysis of non-equally spaced timing data
NASA Technical Reports Server (NTRS)
Vernotte, F.; Zalamansky, G.; Lantz, E.
1994-01-01
Generally, it is possible to obtain equally spaced timing data from oscillators. The measurement of the drifts and noises affecting oscillators is then performed by using a variance (Allan variance, modified Allan variance, or time variance) or a system of several variances (multivariance method). However, in some cases, several samples, or even several sets of samples, are missing. In the case of millisecond pulsar timing data, for instance, observations are quite irregularly spaced in time. Nevertheless, since some observations are very close together (one minute) and since the timing data sequence is very long (more than ten years), information on both short-term and long-term stability is available. Unfortunately, a direct variance analysis is not possible without interpolating missing data. Different interpolation algorithms (linear interpolation, cubic spline) are used to calculate variances in order to verify that they neither lose information nor add erroneous information. A comparison of the results of the different algorithms is given. Finally, the multivariance method was adapted to the measurement sequence of the millisecond pulsar timing data: the responses of each variance of the system are calculated for each type of noise and drift, with the same missing samples as in the pulsar timing sequence. An estimation of precision, dynamics, and separability of this method is given.
Ocean mixing beneath Pine Island Glacier ice shelf, West Antarctica
NASA Astrophysics Data System (ADS)
Kimura, Satoshi; Jenkins, Adrian; Dutrieux, Pierre; Forryan, Alexander; Naveira Garabato, Alberto C.; Firing, Yvonne
2016-12-01
Ice shelves around Antarctica are vulnerable to an increase in ocean-driven melting, with the melt rate depending on ocean temperature and the strength of flow inside the ice-shelf cavities. We present measurements of velocity, temperature, salinity, turbulent kinetic energy dissipation rate, and thermal variance dissipation rate beneath Pine Island Glacier ice shelf, West Antarctica. These measurements were obtained by CTD, ADCP, and turbulence sensors mounted on an Autonomous Underwater Vehicle (AUV). The highest turbulent kinetic energy dissipation rate is found near the grounding line. The thermal variance dissipation rate increases closer to the ice-shelf base, with a maximum value found ˜0.5 m away from the ice. The measurements of turbulent kinetic energy dissipation rate near the ice are used to estimate basal melting of the ice shelf. The dissipation-rate-based melt rate estimates is sensitive to the stability correction parameter in the linear approximation of universal function of the Monin-Obukhov similarity theory for stratified boundary layers. We argue that our estimates of basal melting from dissipation rates are within a range of previous estimates of basal melting.
Multicollinearity in hierarchical linear models.
Yu, Han; Jiang, Shanhe; Land, Kenneth C
2015-09-01
This study investigates an ill-posed problem (multicollinearity) in Hierarchical Linear Models from both the data and the model perspectives. We propose an intuitive, effective approach to diagnosing the presence of multicollinearity and its remedies in this class of models. A simulation study demonstrates the impacts of multicollinearity on coefficient estimates, associated standard errors, and variance components at various levels of multicollinearity for finite sample sizes typical in social science studies. We further investigate the role multicollinearity plays at each level for estimation of coefficient parameters in terms of shrinkage. Based on these analyses, we recommend a top-down method for assessing multicollinearity in HLMs that first examines the contextual predictors (Level-2 in a two-level model) and then the individual predictors (Level-1) and uses the results for data collection, research problem redefinition, model re-specification, variable selection and estimation of a final model. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sorini, D., E-mail: sorini@mpia-hd.mpg.de
2017-04-01
Measuring the clustering of galaxies from surveys allows us to estimate the power spectrum of matter density fluctuations, thus constraining cosmological models. This requires careful modelling of observational effects to avoid misinterpretation of data. In particular, signals coming from different distances encode information from different epochs. This is known as ''light-cone effect'' and is going to have a higher impact as upcoming galaxy surveys probe larger redshift ranges. Generalising the method by Feldman, Kaiser and Peacock (1994) [1], I define a minimum-variance estimator of the linear power spectrum at a fixed time, properly taking into account the light-cone effect. Anmore » analytic expression for the estimator is provided, and that is consistent with the findings of previous works in the literature. I test the method within the context of the Halofit model, assuming Planck 2014 cosmological parameters [2]. I show that the estimator presented recovers the fiducial linear power spectrum at present time within 5% accuracy up to k ∼ 0.80 h Mpc{sup −1} and within 10% up to k ∼ 0.94 h Mpc{sup −1}, well into the non-linear regime of the growth of density perturbations. As such, the method could be useful in the analysis of the data from future large-scale surveys, like Euclid.« less
Encircling the dark: constraining dark energy via cosmic density in spheres
NASA Astrophysics Data System (ADS)
Codis, S.; Pichon, C.; Bernardeau, F.; Uhlemann, C.; Prunet, S.
2016-08-01
The recently published analytic probability density function for the mildly non-linear cosmic density field within spherical cells is used to build a simple but accurate maximum likelihood estimate for the redshift evolution of the variance of the density, which, as expected, is shown to have smaller relative error than the sample variance. This estimator provides a competitive probe for the equation of state of dark energy, reaching a few per cent accuracy on wp and wa for a Euclid-like survey. The corresponding likelihood function can take into account the configuration of the cells via their relative separations. A code to compute one-cell-density probability density functions for arbitrary initial power spectrum, top-hat smoothing and various spherical-collapse dynamics is made available online, so as to provide straightforward means of testing the effect of alternative dark energy models and initial power spectra on the low-redshift matter distribution.
A note on variance estimation in random effects meta-regression.
Sidik, Kurex; Jonkman, Jeffrey N
2005-01-01
For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
Digital Biomass Accumulation Using High-Throughput Plant Phenotype Data Analysis.
Rahaman, Md Matiur; Ahsan, Md Asif; Gillani, Zeeshan; Chen, Ming
2017-09-01
Biomass is an important phenotypic trait in functional ecology and growth analysis. The typical methods for measuring biomass are destructive, and they require numerous individuals to be cultivated for repeated measurements. With the advent of image-based high-throughput plant phenotyping facilities, non-destructive biomass measuring methods have attempted to overcome this problem. Thus, the estimation of plant biomass of individual plants from their digital images is becoming more important. In this paper, we propose an approach to biomass estimation based on image derived phenotypic traits. Several image-based biomass studies state that the estimation of plant biomass is only a linear function of the projected plant area in images. However, we modeled the plant volume as a function of plant area, plant compactness, and plant age to generalize the linear biomass model. The obtained results confirm the proposed model and can explain most of the observed variance during image-derived biomass estimation. Moreover, a small difference was observed between actual and estimated digital biomass, which indicates that our proposed approach can be used to estimate digital biomass accurately.
NASA Astrophysics Data System (ADS)
Moster, Benjamin P.; Somerville, Rachel S.; Newman, Jeffrey A.; Rix, Hans-Walter
2011-04-01
Deep pencil beam surveys (<1 deg2) are of fundamental importance for studying the high-redshift universe. However, inferences about galaxy population properties (e.g., the abundance of objects) are in practice limited by "cosmic variance." This is the uncertainty in observational estimates of the number density of galaxies arising from the underlying large-scale density fluctuations. This source of uncertainty can be significant, especially for surveys which cover only small areas and for massive high-redshift galaxies. Cosmic variance for a given galaxy population can be determined using predictions from cold dark matter theory and the galaxy bias. In this paper, we provide tools for experiment design and interpretation. For a given survey geometry, we present the cosmic variance of dark matter as a function of mean redshift \\bar{z} and redshift bin size Δz. Using a halo occupation model to predict galaxy clustering, we derive the galaxy bias as a function of mean redshift for galaxy samples of a given stellar mass range. In the linear regime, the cosmic variance of these galaxy samples is the product of the galaxy bias and the dark matter cosmic variance. We present a simple recipe using a fitting function to compute cosmic variance as a function of the angular dimensions of the field, \\bar{z}, Δz, and stellar mass m *. We also provide tabulated values and a software tool. The accuracy of the resulting cosmic variance estimates (δσ v /σ v ) is shown to be better than 20%. We find that for GOODS at \\bar{z}=2 and with Δz = 0.5, the relative cosmic variance of galaxies with m *>1011 M sun is ~38%, while it is ~27% for GEMS and ~12% for COSMOS. For galaxies of m * ~ 1010 M sun, the relative cosmic variance is ~19% for GOODS, ~13% for GEMS, and ~6% for COSMOS. This implies that cosmic variance is a significant source of uncertainty at \\bar{z}=2 for small fields and massive galaxies, while for larger fields and intermediate mass galaxies, cosmic variance is less serious.
A semiempirical linear model of indirect, flat-panel x-ray detectors.
Huang, Shih-Ying; Yang, Kai; Abbey, Craig K; Boone, John M
2012-04-01
It is important to understand signal and noise transfer in the indirect, flat-panel x-ray detector when developing and optimizing imaging systems. For optimization where simulating images is necessary, this study introduces a semiempirical model to simulate projection images with user-defined x-ray fluence interaction. The signal and noise transfer in the indirect, flat-panel x-ray detectors is characterized by statistics consistent with energy-integration of x-ray photons. For an incident x-ray spectrum, x-ray photons are attenuated and absorbed in the x-ray scintillator to produce light photons, which are coupled to photodiodes for signal readout. The signal mean and variance are linearly related to the energy-integrated x-ray spectrum by empirically determined factors. With the known first- and second-order statistics, images can be simulated by incorporating multipixel signal statistics and the modulation transfer function of the imaging system. To estimate the semiempirical input to this model, 500 projection images (using an indirect, flat-panel x-ray detector in the breast CT system) were acquired with 50-100 kilovolt (kV) x-ray spectra filtered with 0.1-mm tin (Sn), 0.2-mm copper (Cu), 1.5-mm aluminum (Al), or 0.05-mm silver (Ag). The signal mean and variance of each detector element and the noise power spectra (NPS) were calculated and incorporated into this model for accuracy. Additionally, the modulation transfer function of the detector system was physically measured and incorporated in the image simulation steps. For validation purposes, simulated and measured projection images of air scans were compared using 40 kV∕0.1-mm Sn, 65 kV∕0.2-mm Cu, 85 kV∕1.5-mm Al, and 95 kV∕0.05-mm Ag. The linear relationship between the measured signal statistics and the energy-integrated x-ray spectrum was confirmed and incorporated into the model. The signal mean and variance factors were linearly related to kV for each filter material (r(2) of signal mean to kV: 0.91, 0.93, 0.86, and 0.99 for 0.1-mm Sn, 0.2-mm Cu, 1.5-mm Al, and 0.05-mm Ag, respectively; r(2) of signal variance to kV: 0.99 for all four filters). The comparison of the signal and noise (mean, variance, and NPS) between the simulated and measured air scan images suggested that this model was reasonable in predicting accurate signal statistics of air scan images using absolute percent error. Overall, the model was found to be accurate in estimating signal statistics and spatial correlation between the detector elements of the images acquired with indirect, flat-panel x-ray detectors. The semiempirical linear model of the indirect, flat-panel x-ray detectors was described and validated with images of air scans. The model was found to be a useful tool in understanding the signal and noise transfer within indirect, flat-panel x-ray detector systems.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
NASA Astrophysics Data System (ADS)
Kryanev, A. V.; Ivanov, V. V.; Romanova, A. O.; Sevastyanov, L. A.; Udumyan, D. K.
2018-03-01
This paper considers the problem of separating the trend and the chaotic component of chaotic time series in the absence of information on the characteristics of the chaotic component. Such a problem arises in nuclear physics, biomedicine, and many other applied fields. The scheme has two stages. At the first stage, smoothing linear splines with different values of smoothing parameter are used to separate the "trend component." At the second stage, the method of least squares is used to find the unknown variance σ2 of the noise component.
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Kumar, Satish; Molloy, Claire; Muñoz, Patricio; Daetwyler, Hans; Chagné, David; Volz, Richard
2015-01-01
The nonadditive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and nonadditive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (Malus × domestica Borkh.) phenotypes using relationship matrices constructed from genome-wide dense single nucleotide polymorphism (SNP) markers; and compare the accuracy of genomic predictions using genomic best linear unbiased prediction models with or without including nonadditive genetic effects. A set of 247 clonally replicated individuals was assessed for six fruit quality traits at two sites, and also genotyped using an Illumina 8K SNP array. Across several fruit quality traits, the additive, dominance, and epistatic effects contributed about 30%, 16%, and 19%, respectively, to the total phenotypic variance. Models ignoring nonadditive components yielded upwardly biased estimates of additive variance (heritability) for all traits in this study. The accuracy of genomic predicted genetic values (GEGV) varied from about 0.15 to 0.35 for various traits, and these were almost identical for models with or without including nonadditive effects. However, models including nonadditive genetic effects further reduced the bias of GEGV. Between-site genotypic correlations were high (>0.85) for all traits, and genotype-site interaction accounted for <10% of the phenotypic variability. The accuracy of prediction, when the validation set was present only at one site, was generally similar for both sites, and varied from about 0.50 to 0.85. The prediction accuracies were strongly influenced by trait heritability, and genetic relatedness between the training and validation families. PMID:26497141
Estimation of the simple correlation coefficient.
Shieh, Gwowen
2010-11-01
This article investigates some unfamiliar properties of the Pearson product-moment correlation coefficient for the estimation of simple correlation coefficient. Although Pearson's r is biased, except for limited situations, and the minimum variance unbiased estimator has been proposed in the literature, researchers routinely employ the sample correlation coefficient in their practical applications, because of its simplicity and popularity. In order to support such practice, this study examines the mean squared errors of r and several prominent formulas. The results reveal specific situations in which the sample correlation coefficient performs better than the unbiased and nearly unbiased estimators, facilitating recommendation of r as an effect size index for the strength of linear association between two variables. In addition, related issues of estimating the squared simple correlation coefficient are also considered.
Family size and effective population size in a hatchery stock of coho salmon (Oncorhynchus kisutch)
Simon, R.C.; McIntyre, J.D.; Hemmingsen, A.R.
1986-01-01
Means and variances of family size measured in five year-classes of wire-tagged coho salmon (Oncorhynchus kisutch) were linearly related. Population effective size was calculated by using estimated means and variances of family size in a 25-yr data set. Although numbers of age 3 adults returning to the hatchery appeared to be large enough to avoid inbreeding problems (the 25-yr mean exceeded 4500), the numbers actually contributing to the hatchery production may be too low. Several strategies are proposed to correct the problem perceived. Argument is given to support the contention that the problem of effective size is fairly general and is not confined to the present study population.
NASA Astrophysics Data System (ADS)
Khaleghi, Mohammad Reza; Varvani, Javad
2018-02-01
Complex and variable nature of the river sediment yield caused many problems in estimating the long-term sediment yield and problems input into the reservoirs. Sediment Rating Curves (SRCs) are generally used to estimate the suspended sediment load of the rivers and drainage watersheds. Since the regression equations of the SRCs are obtained by logarithmic retransformation and have a little independent variable in this equation, they also overestimate or underestimate the true sediment load of the rivers. To evaluate the bias correction factors in Kalshor and Kashafroud watersheds, seven hydrometric stations of this region with suitable upstream watershed and spatial distribution were selected. Investigation of the accuracy index (ratio of estimated sediment yield to observed sediment yield) and the precision index of different bias correction factors of FAO, Quasi-Maximum Likelihood Estimator (QMLE), Smearing, and Minimum-Variance Unbiased Estimator (MVUE) with LSD test showed that FAO coefficient increases the estimated error in all of the stations. Application of MVUE in linear and mean load rating curves has not statistically meaningful effects. QMLE and smearing factors increased the estimated error in mean load rating curve, but that does not have any effect on linear rating curve estimation.
Cai, C; Rodet, T; Legoupil, S; Mohammad-Djafari, A
2013-11-01
Dual-energy computed tomography (DECT) makes it possible to get two fractions of basis materials without segmentation. One is the soft-tissue equivalent water fraction and the other is the hard-matter equivalent bone fraction. Practical DECT measurements are usually obtained with polychromatic x-ray beams. Existing reconstruction approaches based on linear forward models without counting the beam polychromaticity fail to estimate the correct decomposition fractions and result in beam-hardening artifacts (BHA). The existing BHA correction approaches either need to refer to calibration measurements or suffer from the noise amplification caused by the negative-log preprocessing and the ill-conditioned water and bone separation problem. To overcome these problems, statistical DECT reconstruction approaches based on nonlinear forward models counting the beam polychromaticity show great potential for giving accurate fraction images. This work proposes a full-spectral Bayesian reconstruction approach which allows the reconstruction of high quality fraction images from ordinary polychromatic measurements. This approach is based on a Gaussian noise model with unknown variance assigned directly to the projections without taking negative-log. Referring to Bayesian inferences, the decomposition fractions and observation variance are estimated by using the joint maximum a posteriori (MAP) estimation method. Subject to an adaptive prior model assigned to the variance, the joint estimation problem is then simplified into a single estimation problem. It transforms the joint MAP estimation problem into a minimization problem with a nonquadratic cost function. To solve it, the use of a monotone conjugate gradient algorithm with suboptimal descent steps is proposed. The performance of the proposed approach is analyzed with both simulated and experimental data. The results show that the proposed Bayesian approach is robust to noise and materials. It is also necessary to have the accurate spectrum information about the source-detector system. When dealing with experimental data, the spectrum can be predicted by a Monte Carlo simulator. For the materials between water and bone, less than 5% separation errors are observed on the estimated decomposition fractions. The proposed approach is a statistical reconstruction approach based on a nonlinear forward model counting the full beam polychromaticity and applied directly to the projections without taking negative-log. Compared to the approaches based on linear forward models and the BHA correction approaches, it has advantages in noise robustness and reconstruction accuracy.
Zhou, Xiang
2017-12-01
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
NASA Astrophysics Data System (ADS)
Hsieh, Scott S.; Pelc, Norbert J.
2014-06-01
Photon counting x-ray detectors (PCXDs) offer several advantages compared to standard energy-integrating x-ray detectors, but also face significant challenges. One key challenge is the high count rates required in CT. At high count rates, PCXDs exhibit count rate loss and show reduced detective quantum efficiency in signal-rich (or high flux) measurements. In order to reduce count rate requirements, a dynamic beam-shaping filter can be used to redistribute flux incident on the patient. We study the piecewise-linear attenuator in conjunction with PCXDs without energy discrimination capabilities. We examined three detector models: the classic nonparalyzable and paralyzable detector models, and a ‘hybrid’ detector model which is a weighted average of the two which approximates an existing, real detector (Taguchi et al 2011 Med. Phys. 38 1089-102 ). We derive analytic expressions for the variance of the CT measurements for these detectors. These expressions are used with raw data estimated from DICOM image files of an abdomen and a thorax to estimate variance in reconstructed images for both the dynamic attenuator and a static beam-shaping (‘bowtie’) filter. By redistributing flux, the dynamic attenuator reduces dose by 40% without increasing peak variance for the ideal detector. For non-ideal PCXDs, the impact of count rate loss is also reduced. The nonparalyzable detector shows little impact from count rate loss, but with the paralyzable model, count rate loss leads to noise streaks that can be controlled with the dynamic attenuator. With the hybrid model, the characteristic count rates required before noise streaks dominate the reconstruction are reduced by a factor of 2 to 3. We conclude that the piecewise-linear attenuator can reduce the count rate requirements of the PCXD in addition to improving dose efficiency. The magnitude of this reduction depends on the detector, with paralyzable detectors showing much greater benefit than nonparalyzable detectors.
Vanderick, S; Troch, T; Gillon, A; Glorieux, G; Gengler, N
2014-12-01
Calving ease scores from Holstein dairy cattle in the Walloon Region of Belgium were analysed using univariate linear and threshold animal models. Variance components and derived genetic parameters were estimated from a data set including 33,155 calving records. Included in the models were season, herd and sex of calf × age of dam classes × group of calvings interaction as fixed effects, herd × year of calving, maternal permanent environment and animal direct and maternal additive genetic as random effects. Models were fitted with the genetic correlation between direct and maternal additive genetic effects either estimated or constrained to zero. Direct heritability for calving ease was approximately 8% with linear models and approximately 12% with threshold models. Maternal heritabilities were approximately 2 and 4%, respectively. Genetic correlation between direct and maternal additive effects was found to be not significantly different from zero. Models were compared in terms of goodness of fit and predictive ability. Criteria of comparison such as mean squared error, correlation between observed and predicted calving ease scores as well as between estimated breeding values were estimated from 85,118 calving records. The results provided few differences between linear and threshold models even though correlations between estimated breeding values from subsets of data for sires with progeny from linear model were 17 and 23% greater for direct and maternal genetic effects, respectively, than from threshold model. For the purpose of genetic evaluation for calving ease in Walloon Holstein dairy cattle, the linear animal model without covariance between direct and maternal additive effects was found to be the best choice. © 2014 Blackwell Verlag GmbH.
The effects of heat stress in Italian Holstein dairy cattle.
Bernabucci, U; Biffani, S; Buggiotti, L; Vitali, A; Lacetera, N; Nardone, A
2014-01-01
The data set for this study comprised 1,488,474 test-day records for milk, fat, and protein yields and fat and protein percentages from 191,012 first-, second-, and third-parity Holstein cows from 484 farms. Data were collected from 2001 through 2007 and merged with meteorological data from 35 weather stations. A linear model (M1) was used to estimate the effects of the temperature-humidity index (THI) on production traits. Least squares means from M1 were used to detect the THI thresholds for milk production in all parities by using a 2-phase linear regression procedure (M2). A multiple-trait repeatability test-model (M3) was used to estimate variance components for all traits and a dummy regression variable (t) was defined to estimate the production decline caused by heat stress. Additionally, the estimated variance components and M3 were used to estimate traditional and heat-tolerance breeding values (estimated breeding values, EBV) for milk yield and protein percentages at parity 1. An analysis of data (M2) indicated that the daily THI at which milk production started to decline for the 3 parities and traits ranged from 65 to 76. These THI values can be achieved with different temperature/humidity combinations with a range of temperatures from 21 to 36°C and relative humidity values from 5 to 95%. The highest negative effect of THI was observed 4 d before test day over the 3 parities for all traits. The negative effect of THI on production traits indicates that first-parity cows are less sensitive to heat stress than multiparous cows. Over the parities, the general additive genetic variance decreased for protein content and increased for milk yield and fat and protein yield. Additive genetic variance for heat tolerance showed an increase from the first to third parity for milk, protein, and fat yield, and for protein percentage. Genetic correlations between general and heat stress effects were all unfavorable (from -0.24 to -0.56). Three EBV per trait were calculated for each cow and bull (traditional EBV, traditional EBV estimated with the inclusion of THI covariate effect, and heat tolerance EBV) and the rankings of EBV for 283 bulls born after 1985 with at least 50 daughters were compared. When THI was included in the model, the ranking for 17 and 32 bulls changed for milk yield and protein percentage, respectively. The heat tolerance genetic component is not negligible, suggesting that heat tolerance selection should be included in the selection objectives. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Network Structure and Biased Variance Estimation in Respondent Driven Sampling
Verdery, Ashton M.; Mouw, Ted; Bauldry, Shawn; Mucha, Peter J.
2015-01-01
This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network. PMID:26679927
Identification of transmissivity fields using a Bayesian strategy and perturbative approach
NASA Astrophysics Data System (ADS)
Zanini, Andrea; Tanda, Maria Giovanna; Woodbury, Allan D.
2017-10-01
The paper deals with the crucial problem of the groundwater parameter estimation that is the basis for efficient modeling and reclamation activities. A hierarchical Bayesian approach is developed: it uses the Akaike's Bayesian Information Criteria in order to estimate the hyperparameters (related to the covariance model chosen) and to quantify the unknown noise variance. The transmissivity identification proceeds in two steps: the first, called empirical Bayesian interpolation, uses Y* (Y = lnT) observations to interpolate Y values on a specified grid; the second, called empirical Bayesian update, improve the previous Y estimate through the addition of hydraulic head observations. The relationship between the head and the lnT has been linearized through a perturbative solution of the flow equation. In order to test the proposed approach, synthetic aquifers from literature have been considered. The aquifers in question contain a variety of boundary conditions (both Dirichelet and Neuman type) and scales of heterogeneities (σY2 = 1.0 and σY2 = 5.3). The estimated transmissivity fields were compared to the true one. The joint use of Y* and head measurements improves the estimation of Y considering both degrees of heterogeneity. Even if the variance of the strong transmissivity field can be considered high for the application of the perturbative approach, the results show the same order of approximation of the non-linear methods proposed in literature. The procedure allows to compute the posterior probability distribution of the target quantities and to quantify the uncertainty in the model prediction. Bayesian updating has advantages related both to the Monte-Carlo (MC) and non-MC approaches. In fact, as the MC methods, Bayesian updating allows computing the direct posterior probability distribution of the target quantities and as non-MC methods it has computational times in the order of seconds.
Makeyev, Oleksandr; Joe, Cody; Lee, Colin; Besio, Walter G
2017-07-01
Concentric ring electrodes have shown promise in non-invasive electrophysiological measurement demonstrating their superiority to conventional disc electrodes, in particular, in accuracy of Laplacian estimation. Recently, we have proposed novel variable inter-ring distances concentric ring electrodes. Analytic and finite element method modeling results for linearly increasing distances electrode configurations suggested they may decrease the truncation error resulting in more accurate Laplacian estimates compared to currently used constant inter-ring distances configurations. This study assesses statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes. Full factorial design of analysis of variance was used with one categorical and two numerical factors: the inter-ring distances, the electrode diameter, and the number of concentric rings in the electrode. The response variables were the Relative Error and the Maximum Error of Laplacian estimation computed using a finite element method model for each of the combinations of levels of three factors. Effects of the main factors and their interactions on Relative Error and Maximum Error were assessed and the obtained results suggest that all three factors have statistically significant effects in the model confirming the potential of using inter-ring distances as a means of improving accuracy of Laplacian estimation.
Vandenplas, J; Bastin, C; Gengler, N; Mulder, H A
2013-09-01
Animals that are robust to environmental changes are desirable in the current dairy industry. Genetic differences in micro-environmental sensitivity can be studied through heterogeneity of residual variance between animals. However, residual variance between animals is usually assumed to be homogeneous in traditional genetic evaluations. The aim of this study was to investigate genetic heterogeneity of residual variance by estimating variance components in residual variance for milk yield, somatic cell score, contents in milk (g/dL) of 2 groups of milk fatty acids (i.e., saturated and unsaturated fatty acids), and the content in milk of one individual fatty acid (i.e., oleic acid, C18:1 cis-9), for first-parity Holstein cows in the Walloon Region of Belgium. A total of 146,027 test-day records from 26,887 cows in 747 herds were available. All cows had at least 3 records and a known sire. These sires had at least 10 cows with records and each herd × test-day had at least 5 cows. The 5 traits were analyzed separately based on fixed lactation curve and random regression test-day models for the mean. Estimation of variance components was performed by running iteratively expectation maximization-REML algorithm by the implementation of double hierarchical generalized linear models. Based on fixed lactation curve test-day mean models, heritability for residual variances ranged between 1.01×10(-3) and 4.17×10(-3) for all traits. The genetic standard deviation in residual variance (i.e., approximately the genetic coefficient of variation of residual variance) ranged between 0.12 and 0.17. Therefore, some genetic variance in micro-environmental sensitivity existed in the Walloon Holstein dairy cattle for the 5 studied traits. The standard deviations due to herd × test-day and permanent environment in residual variance ranged between 0.36 and 0.45 for herd × test-day effect and between 0.55 and 0.97 for permanent environmental effect. Therefore, nongenetic effects also contributed substantially to micro-environmental sensitivity. Addition of random regressions to the mean model did not reduce heterogeneity in residual variance and that genetic heterogeneity of residual variance was not simply an effect of an incomplete mean model. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Kalman filter data assimilation: targeting observations and parameter estimation.
Bellsky, Thomas; Kostelich, Eric J; Mahalov, Alex
2014-06-01
This paper studies the effect of targeted observations on state and parameter estimates determined with Kalman filter data assimilation (DA) techniques. We first provide an analytical result demonstrating that targeting observations within the Kalman filter for a linear model can significantly reduce state estimation error as opposed to fixed or randomly located observations. We next conduct observing system simulation experiments for a chaotic model of meteorological interest, where we demonstrate that the local ensemble transform Kalman filter (LETKF) with targeted observations based on largest ensemble variance is skillful in providing more accurate state estimates than the LETKF with randomly located observations. Additionally, we find that a hybrid ensemble Kalman filter parameter estimation method accurately updates model parameters within the targeted observation context to further improve state estimation.
Kalman filter data assimilation: Targeting observations and parameter estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bellsky, Thomas, E-mail: bellskyt@asu.edu; Kostelich, Eric J.; Mahalov, Alex
2014-06-15
This paper studies the effect of targeted observations on state and parameter estimates determined with Kalman filter data assimilation (DA) techniques. We first provide an analytical result demonstrating that targeting observations within the Kalman filter for a linear model can significantly reduce state estimation error as opposed to fixed or randomly located observations. We next conduct observing system simulation experiments for a chaotic model of meteorological interest, where we demonstrate that the local ensemble transform Kalman filter (LETKF) with targeted observations based on largest ensemble variance is skillful in providing more accurate state estimates than the LETKF with randomly locatedmore » observations. Additionally, we find that a hybrid ensemble Kalman filter parameter estimation method accurately updates model parameters within the targeted observation context to further improve state estimation.« less
Estimating individual glomerular volume in the human kidney: clinical perspectives.
Puelles, Victor G; Zimanyi, Monika A; Samuel, Terence; Hughson, Michael D; Douglas-Denton, Rebecca N; Bertram, John F; Armitage, James A
2012-05-01
Measurement of individual glomerular volumes (IGV) has allowed the identification of drivers of glomerular hypertrophy in subjects without overt renal pathology. This study aims to highlight the relevance of IGV measurements with possible clinical implications and determine how many profiles must be measured in order to achieve stable size distribution estimates. We re-analysed 2250 IGV estimates obtained using the disector/Cavalieri method in 41 African and 34 Caucasian Americans. Pooled IGV analysis of mean and variance was conducted. Monte-Carlo (Jackknife) simulations determined the effect of the number of sampled glomeruli on mean IGV. Lin's concordance coefficient (R(C)), coefficient of variation (CV) and coefficient of error (CE) measured reliability. IGV mean and variance increased with overweight and hypertensive status. Superficial glomeruli were significantly smaller than juxtamedullary glomeruli in all subjects (P < 0.01), by race (P < 0.05) and in obese individuals (P < 0.01). Subjects with multiple chronic kidney disease (CKD) comorbidities showed significant increases in IGV mean and variability. Overall, mean IGV was particularly reliable with nine or more sampled glomeruli (R(C) > 0.95, <5% difference in CV and CE). These observations were not affected by a reduced sample size and did not disrupt the inverse linear correlation between mean IGV and estimated total glomerular number. Multiple comorbidities for CKD are associated with increased IGV mean and variance within subjects, including overweight, obesity and hypertension. Zonal selection and the number of sampled glomeruli do not represent drawbacks for future longitudinal biopsy-based studies of glomerular size and distribution.
Covariance functions for body weight from birth to maturity in Nellore cows.
Boligon, A A; Mercadante, M E Z; Forni, S; Lôbo, R B; Albuquerque, L G
2010-03-01
The objective of this study was to estimate (co)variance functions using random regression models on Legendre polynomials for the analysis of repeated measures of BW from birth to adult age. A total of 82,064 records from 8,145 females were analyzed. Different models were compared. The models included additive direct and maternal effects, and animal and maternal permanent environmental effects as random terms. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of animal age (cubic regression) were considered as random covariables. Eight models with polynomials of third to sixth order were used to describe additive direct and maternal effects, and animal and maternal permanent environmental effects. Residual effects were modeled using 1 (i.e., assuming homogeneity of variances across all ages) or 5 age classes. The model with 5 classes was the best to describe the trajectory of residuals along the growth curve. The model including fourth- and sixth-order polynomials for additive direct and animal permanent environmental effects, respectively, and third-order polynomials for maternal genetic and maternal permanent environmental effects were the best. Estimates of (co)variance obtained with the multi-trait and random regression models were similar. Direct heritability estimates obtained with the random regression models followed a trend similar to that obtained with the multi-trait model. The largest estimates of maternal heritability were those of BW taken close to 240 d of age. In general, estimates of correlation between BW from birth to 8 yr of age decreased with increasing distance between ages.
Brooks, M.H.; Schroder, L.J.; Malo, B.A.
1985-01-01
Four laboratories were evaluated in their analysis of identical natural and simulated precipitation water samples. Interlaboratory comparability was evaluated using analysis of variance coupled with Duncan 's multiple range test, and linear-regression models describing the relations between individual laboratory analytical results for natural precipitation samples. Results of the statistical analyses indicate that certain pairs of laboratories produce different results when analyzing identical samples. Analyte bias for each laboratory was examined using analysis of variance coupled with Duncan 's multiple range test on data produced by the laboratories from the analysis of identical simulated precipitation samples. Bias for a given analyte produced by a single laboratory has been indicated when the laboratory mean for that analyte is shown to be significantly different from the mean for the most-probable analyte concentrations in the simulated precipitation samples. Ion-chromatographic methods for the determination of chloride, nitrate, and sulfate have been compared with the colorimetric methods that were also in use during the study period. Comparisons were made using analysis of variance coupled with Duncan 's multiple range test for means produced by the two methods. Analyte precision for each laboratory has been estimated by calculating a pooled variance for each analyte. Analyte estimated precisions have been compared using F-tests and differences in analyte precisions for laboratory pairs have been reported. (USGS)
Variance adaptation in navigational decision making
NASA Astrophysics Data System (ADS)
Gershow, Marc; Gepner, Ruben; Wolk, Jason; Wadekar, Digvijay
Drosophila larvae navigate their environments using a biased random walk strategy. A key component of this strategy is the decision to initiate a turn (change direction) in response to declining conditions. We modeled this decision as the output of a Linear-Nonlinear-Poisson cascade and used reverse correlation with visual and fictive olfactory stimuli to find the parameters of this model. Because the larva responds to changes in stimulus intensity, we used stimuli with uncorrelated normally distributed intensity derivatives, i.e. Brownian processes, and took the stimulus derivative as the input to our LNP cascade. In this way, we were able to present stimuli with 0 mean and controlled variance. We found that the nonlinear rate function depended on the variance in the stimulus input, allowing larvae to respond more strongly to small changes in low-noise compared to high-noise environments. We measured the rate at which the larva adapted its behavior following changes in stimulus variance, and found that larvae adapted more quickly to increases in variance than to decreases, consistent with the behavior of an optimal Bayes estimator. Supported by NIH Grant 1DP2EB022359 and NSF Grant PHY-1455015.
Least-squares dual characterization for ROI assessment in emission tomography
NASA Astrophysics Data System (ADS)
Ben Bouallègue, F.; Crouzet, J. F.; Dubois, A.; Buvat, I.; Mariano-Goulart, D.
2013-06-01
Our aim is to describe an original method for estimating the statistical properties of regions of interest (ROIs) in emission tomography. Drawn upon the works of Louis on the approximate inverse, we propose a dual formulation of the ROI estimation problem to derive the ROI activity and variance directly from the measured data without any image reconstruction. The method requires the definition of an ROI characteristic function that can be extracted from a co-registered morphological image. This characteristic function can be smoothed to optimize the resolution-variance tradeoff. An iterative procedure is detailed for the solution of the dual problem in the least-squares sense (least-squares dual (LSD) characterization), and a linear extrapolation scheme is described to compensate for sampling partial volume effect and reduce the estimation bias (LSD-ex). LSD and LSD-ex are compared with classical ROI estimation using pixel summation after image reconstruction and with Huesman's method. For this comparison, we used Monte Carlo simulations (GATE simulation tool) of 2D PET data of a Hoffman brain phantom containing three small uniform high-contrast ROIs and a large non-uniform low-contrast ROI. Our results show that the performances of LSD characterization are at least as good as those of the classical methods in terms of root mean square (RMS) error. For the three small tumor regions, LSD-ex allows a reduction in the estimation bias by up to 14%, resulting in a reduction in the RMS error of up to 8.5%, compared with the optimal classical estimation. For the large non-specific region, LSD using appropriate smoothing could intuitively and efficiently handle the resolution-variance tradeoff.
Estimating the encounter rate variance in distance sampling
Fewster, R.M.; Buckland, S.T.; Burnham, K.P.; Borchers, D.L.; Jupp, P.E.; Laake, J.L.; Thomas, L.
2009-01-01
The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. ?? 2008, The International Biometric Society.
Quantitative PET Imaging in Drug Development: Estimation of Target Occupancy.
Naganawa, Mika; Gallezot, Jean-Dominique; Rossano, Samantha; Carson, Richard E
2017-12-11
Positron emission tomography, an imaging tool using radiolabeled tracers in humans and preclinical species, has been widely used in recent years in drug development, particularly in the central nervous system. One important goal of PET in drug development is assessing the occupancy of various molecular targets (e.g., receptors, transporters, enzymes) by exogenous drugs. The current linear mathematical approaches used to determine occupancy using PET imaging experiments are presented. These algorithms use results from multiple regions with different target content in two scans, a baseline (pre-drug) scan and a post-drug scan. New mathematical estimation approaches to determine target occupancy, using maximum likelihood, are presented. A major challenge in these methods is the proper definition of the covariance matrix of the regional binding measures, accounting for different variance of the individual regional measures and their nonzero covariance, factors that have been ignored by conventional methods. The novel methods are compared to standard methods using simulation and real human occupancy data. The simulation data showed the expected reduction in variance and bias using the proper maximum likelihood methods, when the assumptions of the estimation method matched those in simulation. Between-method differences for data from human occupancy studies were less obvious, in part due to small dataset sizes. These maximum likelihood methods form the basis for development of improved PET covariance models, in order to minimize bias and variance in PET occupancy studies.
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.
2013-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Framework to trade optimality for local processing in large-scale wavefront reconstruction problems.
Haber, Aleksandar; Verhaegen, Michel
2016-11-15
We show that the minimum variance wavefront estimation problems permit localized approximate solutions, in the sense that the wavefront value at a point (excluding unobservable modes, such as the piston mode) can be approximated by a linear combination of the wavefront slope measurements in the point's neighborhood. This enables us to efficiently compute a wavefront estimate by performing a single sparse matrix-vector multiplication. Moreover, our results open the possibility for the development of wavefront estimators that can be easily implemented in a decentralized/distributed manner, and in which the estimate optimality can be easily traded for computational efficiency. We numerically validate our approach on Hudgin wavefront sensor geometries, and the results can be easily generalized to Fried geometries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Dale N; Bonner, Jessie L; Stroujkova, Anastasia
Our objective is to improve seismic event screening using the properties of surface waves, We are accomplishing this through (1) the development of a Love-wave magnitude formula that is complementary to the Russell (2006) formula for Rayleigh waves and (2) quantifying differences in complexities and magnitude variances for earthquake and explosion-generated surface waves. We have applied the M{sub s} (VMAX) analysis (Bonner et al., 2006) using both Love and Rayleigh waves to events in the Middle East and Korean Peninsula, For the Middle East dataset consisting of approximately 100 events, the Love M{sub s} (VMAX) is greater than the Rayleighmore » M{sub s} (VMAX) estimated for individual stations for the majority of the events and azimuths, with the exception of the measurements for the smaller events from European stations to the northeast. It is unclear whether these smaller events suffer from magnitude bias for the Love waves or whether the paths, which include the Caspian and Mediterranean, have variable attenuation for Love and Rayleigh waves. For the Korean Peninsula, we have estimated Rayleigh- and Love-wave magnitudes for 31 earthquakes and two nuclear explosions, including the 25 May 2009 event. For 25 of the earthquakes, the network-averaged Love-wave magnitude is larger than the Rayleigh-wave estimate. For the 2009 nuclear explosion, the Love-wave M{sub s} (VMAX) was 3.1 while the Rayleigh-wave magnitude was 3.6. We are also utilizing the potential of observed variances in M{sub s} estimates that differ significantly in earthquake and explosion populations. We have considered two possible methods for incorporating unequal variances into the discrimination problem and compared the performance of various approaches on a population of 73 western United States earthquakes and 131 Nevada Test Site explosions. The approach proposes replacing the M{sub s} component by M{sub s} + a* {sigma}, where {sigma} denotes the interstation standard deviation obtained from the stations in the sample that produced the M{sub s} value. We replace the usual linear discriminant a* M{sub s}+b*{sub m{sub b}} with a* M{sub s}+b*{sub m{sub b}} + C*{sigma}. In the second approach, we estimate the optimum hybrid linear-quadratic discriminant function resulting from the unequal variance assumption. We observed slight improvement for the discriminant functions resulting from the theoretical interpretations of the unequal variance function. We have also studied the complexity of the ''magnitude spectra'' at each station. Our hypothesis is that explosion spectra should have fewer focal mechanism-produced complexities in the magnitude spectra than earthquakes. We have developed an intrastation ''complexity'' metric {Delta}M{sub s}, where {Delta}M{sub s} = M{sub s}(i)-M{sub s}(i+1) at periods, i, which are between 9 and 25 seconds. The complexity by itself has discriminating power but does not add substantially to the conditional hybrid discriminant that incorporates the differing spreads of the earthquake and explosion standard deviations.« less
Statistical power for detecting trends with applications to seabird monitoring
Hatch, Shyla A.
2003-01-01
Power analysis is helpful in defining goals for ecological monitoring and evaluating the performance of ongoing efforts. I examined detection standards proposed for population monitoring of seabirds using two programs (MONITOR and TRENDS) specially designed for power analysis of trend data. Neither program models within- and among-years components of variance explicitly and independently, thus an error term that incorporates both components is an essential input. Residual variation in seabird counts consisted of day-to-day variation within years and unexplained variation among years in approximately equal parts. The appropriate measure of error for power analysis is the standard error of estimation (S.E.est) from a regression of annual means against year. Replicate counts within years are helpful in minimizing S.E.est but should not be treated as independent samples for estimating power to detect trends. Other issues include a choice of assumptions about variance structure and selection of an exponential or linear model of population change. Seabird count data are characterized by strong correlations between S.D. and mean, thus a constant CV model is appropriate for power calculations. Time series were fit about equally well with exponential or linear models, but log transformation ensures equal variances over time, a basic assumption of regression analysis. Using sample data from seabird monitoring in Alaska, I computed the number of years required (with annual censusing) to detect trends of -1.4% per year (50% decline in 50 years) and -2.7% per year (50% decline in 25 years). At ??=0.05 and a desired power of 0.9, estimated study intervals ranged from 11 to 69 years depending on species, trend, software, and study design. Power to detect a negative trend of 6.7% per year (50% decline in 10 years) is suggested as an alternative standard for seabird monitoring that achieves a reasonable match between statistical and biological significance.
Evidence of directional and stabilizing selection in contemporary humans.
Sanjak, Jaleal S; Sidorenko, Julia; Robinson, Matthew R; Thornton, Kevin R; Visscher, Peter M
2018-01-02
Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.
Efficiently estimating salmon escapement uncertainty using systematically sampled data
Reynolds, Joel H.; Woody, Carol Ann; Gove, Nancy E.; Fair, Lowell F.
2007-01-01
Fish escapement is generally monitored using nonreplicated systematic sampling designs (e.g., via visual counts from towers or hydroacoustic counts). These sampling designs support a variety of methods for estimating the variance of the total escapement. Unfortunately, all the methods give biased results, with the magnitude of the bias being determined by the underlying process patterns. Fish escapement commonly exhibits positive autocorrelation and nonlinear patterns, such as diurnal and seasonal patterns. For these patterns, poor choice of variance estimator can needlessly increase the uncertainty managers have to deal with in sustaining fish populations. We illustrate the effect of sampling design and variance estimator choice on variance estimates of total escapement for anadromous salmonids from systematic samples of fish passage. Using simulated tower counts of sockeye salmon Oncorhynchus nerka escapement on the Kvichak River, Alaska, five variance estimators for nonreplicated systematic samples were compared to determine the least biased. Using the least biased variance estimator, four confidence interval estimators were compared for expected coverage and mean interval width. Finally, five systematic sampling designs were compared to determine the design giving the smallest average variance estimate for total annual escapement. For nonreplicated systematic samples of fish escapement, all variance estimators were positively biased. Compared to the other estimators, the least biased estimator reduced bias by, on average, from 12% to 98%. All confidence intervals gave effectively identical results. Replicated systematic sampling designs consistently provided the smallest average estimated variance among those compared.
ERIC Educational Resources Information Center
Shieh, Gwowen; Jan, Show-Li
2015-01-01
The general formulation of a linear combination of population means permits a wide range of research questions to be tested within the context of ANOVA. However, it has been stressed in many research areas that the homogeneous variances assumption is frequently violated. To accommodate the heterogeneity of variance structure, the…
Order-constrained linear optimization.
Tidwell, Joe W; Dougherty, Michael R; Chrabaszcz, Jeffrey S; Thomas, Rick P
2017-11-01
Despite the fact that data and theories in the social, behavioural, and health sciences are often represented on an ordinal scale, there has been relatively little emphasis on modelling ordinal properties. The most common analytic framework used in psychological science is the general linear model, whose variants include ANOVA, MANOVA, and ordinary linear regression. While these methods are designed to provide the best fit to the metric properties of the data, they are not designed to maximally model ordinal properties. In this paper, we develop an order-constrained linear least-squares (OCLO) optimization algorithm that maximizes the linear least-squares fit to the data conditional on maximizing the ordinal fit based on Kendall's τ. The algorithm builds on the maximum rank correlation estimator (Han, 1987, Journal of Econometrics, 35, 303) and the general monotone model (Dougherty & Thomas, 2012, Psychological Review, 119, 321). Analyses of simulated data indicate that when modelling data that adhere to the assumptions of ordinary least squares, OCLO shows minimal bias, little increase in variance, and almost no loss in out-of-sample predictive accuracy. In contrast, under conditions in which data include a small number of extreme scores (fat-tailed distributions), OCLO shows less bias and variance, and substantially better out-of-sample predictive accuracy, even when the outliers are removed. We show that the advantages of OCLO over ordinary least squares in predicting new observations hold across a variety of scenarios in which researchers must decide to retain or eliminate extreme scores when fitting data. © 2017 The British Psychological Society.
Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza
2017-09-27
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.
Robust versus consistent variance estimators in marginal structural Cox models.
Enders, Dirk; Engel, Susanne; Linder, Roland; Pigeot, Iris
2018-06-11
In survival analyses, inverse-probability-of-treatment (IPT) and inverse-probability-of-censoring (IPC) weighted estimators of parameters in marginal structural Cox models are often used to estimate treatment effects in the presence of time-dependent confounding and censoring. In most applications, a robust variance estimator of the IPT and IPC weighted estimator is calculated leading to conservative confidence intervals. This estimator assumes that the weights are known rather than estimated from the data. Although a consistent estimator of the asymptotic variance of the IPT and IPC weighted estimator is generally available, applications and thus information on the performance of the consistent estimator are lacking. Reasons might be a cumbersome implementation in statistical software, which is further complicated by missing details on the variance formula. In this paper, we therefore provide a detailed derivation of the variance of the asymptotic distribution of the IPT and IPC weighted estimator and explicitly state the necessary terms to calculate a consistent estimator of this variance. We compare the performance of the robust and consistent variance estimators in an application based on routine health care data and in a simulation study. The simulation reveals no substantial differences between the 2 estimators in medium and large data sets with no unmeasured confounding, but the consistent variance estimator performs poorly in small samples or under unmeasured confounding, if the number of confounders is large. We thus conclude that the robust estimator is more appropriate for all practical purposes. Copyright © 2018 John Wiley & Sons, Ltd.
Torija, Antonio J; Ruiz, Diego P
2012-10-01
Road traffic has a heavy impact on the urban sound environment, constituting the main source of noise and widely dominating its spectral composition. In this context, our research investigates the use of recorded sound spectra as input data for the development of real-time short-term road traffic flow estimation models. For this, a series of models based on the use of Multilayer Perceptron Neural Networks, multiple linear regression, and the Fisher linear discriminant were implemented to estimate road traffic flow as well as to classify it according to the composition of heavy vehicles and motorcycles/mopeds. In view of the results, the use of the 50-400 Hz and 1-2.5 kHz frequency ranges as input variables in multilayer perceptron-based models successfully estimated urban road traffic flow with an average percentage of explained variance equal to 86%, while the classification of the urban road traffic flow gave an average success rate of 96.1%. Copyright © 2012 Elsevier B.V. All rights reserved.
A Variance Distribution Model of Surface EMG Signals Based on Inverse Gamma Distribution.
Hayashi, Hideaki; Furui, Akira; Kurita, Yuichi; Tsuji, Toshio
2017-11-01
Objective: This paper describes the formulation of a surface electromyogram (EMG) model capable of representing the variance distribution of EMG signals. Methods: In the model, EMG signals are handled based on a Gaussian white noise process with a mean of zero for each variance value. EMG signal variance is taken as a random variable that follows inverse gamma distribution, allowing the representation of noise superimposed onto this variance. Variance distribution estimation based on marginal likelihood maximization is also outlined in this paper. The procedure can be approximated using rectified and smoothed EMG signals, thereby allowing the determination of distribution parameters in real time at low computational cost. Results: A simulation experiment was performed to evaluate the accuracy of distribution estimation using artificially generated EMG signals, with results demonstrating that the proposed model's accuracy is higher than that of maximum-likelihood-based estimation. Analysis of variance distribution using real EMG data also suggested a relationship between variance distribution and signal-dependent noise. Conclusion: The study reported here was conducted to examine the performance of a proposed surface EMG model capable of representing variance distribution and a related distribution parameter estimation method. Experiments using artificial and real EMG data demonstrated the validity of the model. Significance: Variance distribution estimated using the proposed model exhibits potential in the estimation of muscle force. Objective: This paper describes the formulation of a surface electromyogram (EMG) model capable of representing the variance distribution of EMG signals. Methods: In the model, EMG signals are handled based on a Gaussian white noise process with a mean of zero for each variance value. EMG signal variance is taken as a random variable that follows inverse gamma distribution, allowing the representation of noise superimposed onto this variance. Variance distribution estimation based on marginal likelihood maximization is also outlined in this paper. The procedure can be approximated using rectified and smoothed EMG signals, thereby allowing the determination of distribution parameters in real time at low computational cost. Results: A simulation experiment was performed to evaluate the accuracy of distribution estimation using artificially generated EMG signals, with results demonstrating that the proposed model's accuracy is higher than that of maximum-likelihood-based estimation. Analysis of variance distribution using real EMG data also suggested a relationship between variance distribution and signal-dependent noise. Conclusion: The study reported here was conducted to examine the performance of a proposed surface EMG model capable of representing variance distribution and a related distribution parameter estimation method. Experiments using artificial and real EMG data demonstrated the validity of the model. Significance: Variance distribution estimated using the proposed model exhibits potential in the estimation of muscle force.
Khan, I.; Hawlader, Sophie Mohammad Delwer Hossain; Arifeen, Shams El; Moore, Sophie; Hills, Andrew P.; Wells, Jonathan C.; Persson, Lars-Åke; Kabir, Iqbal
2012-01-01
The aim of this study was to investigate the validity of the Tanita TBF 300A leg-to-leg bioimpedance analyzer for estimating fat-free mass (FFM) in Bangladeshi children aged 4-10 years and to develop novel prediction equations for use in this population, using deuterium dilution as the reference method. Two hundred Bangladeshi children were enrolled. The isotope dilution technique with deuterium oxide was used for estimation of total body water (TBW). FFM estimated by Tanita was compared with results of deuterium oxide dilution technique. Novel prediction equations were created for estimating FFM, using linear regression models, fitting child's height and impedance as predictors. There was a significant difference in FFM and percentage of body fat (BF%) between methods (p<0.01), Tanita underestimating TBW in boys (p=0.001) and underestimating BF% in girls (p<0.001). A basic linear regression model with height and impedance explained 83% of the variance in FFM estimated by deuterium oxide dilution technique. The best-fit equation to predict FFM from linear regression modelling was achieved by adding weight, sex, and age to the basic model, bringing the adjusted R2 to 89% (standard error=0.90, p<0.001). These data suggest Tanita analyzer may be a valid field-assessment technique in Bangladeshi children when using population-specific prediction equations, such as the ones developed here. PMID:23082630
Thorlund, Kristian; Thabane, Lehana; Mills, Edward J
2013-01-11
Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the 'common variance' assumption). This approach 'borrows strength' for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary. In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities. In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach. MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice.
Analytic variance estimates of Swank and Fano factors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gutierrez, Benjamin; Badano, Aldo; Samuelson, Frank, E-mail: frank.samuelson@fda.hhs.gov
Purpose: Variance estimates for detector energy resolution metrics can be used as stopping criteria in Monte Carlo simulations for the purpose of ensuring a small uncertainty of those metrics and for the design of variance reduction techniques. Methods: The authors derive an estimate for the variance of two energy resolution metrics, the Swank factor and the Fano factor, in terms of statistical moments that can be accumulated without significant computational overhead. The authors examine the accuracy of these two estimators and demonstrate how the estimates of the coefficient of variation of the Swank and Fano factors behave with data frommore » a Monte Carlo simulation of an indirect x-ray imaging detector. Results: The authors' analyses suggest that the accuracy of their variance estimators is appropriate for estimating the actual variances of the Swank and Fano factors for a variety of distributions of detector outputs. Conclusions: The variance estimators derived in this work provide a computationally convenient way to estimate the error or coefficient of variation of the Swank and Fano factors during Monte Carlo simulations of radiation imaging systems.« less
Hu, Wen
2017-06-01
In November 2010 and October 2013, Utah increased speed limits on sections of rural interstates from 75 to 80mph. Effects on vehicle speeds and speed variance were examined. Speeds were measured in May 2010 and May 2014 within the new 80mph zones, and at a nearby spillover site and at more distant control sites where speed limits remained 75mph. Log-linear regression models estimated percentage changes in speed variance and mean speeds for passenger vehicles and large trucks associated with the speed limit increase. Logistic regression models estimated effects on the probability of passenger vehicles exceeding 80, 85, or 90mph and large trucks exceeding 80mph. Within the 80mph zones and at the spillover location in 2014, mean passenger vehicle speeds were significantly higher (4.1% and 3.5%, respectively), as were the probabilities that passenger vehicles exceeded 80mph (122.3% and 88.5%, respectively), than would have been expected without the speed limit increase. Probabilities that passenger vehicles exceeded 85 and 90mph were non-significantly higher than expected within the 80mph zones. For large trucks, the mean speed and probability of exceeding 80mph were higher than expected within the 80mph zones. Only the increase in mean speed was significant. Raising the speed limit was associated with non-significant increases in speed variance. The study adds to the wealth of evidence that increasing speed limits leads to higher travel speeds and an increased probability of exceeding the new speed limit. Results moreover contradict the claim that increasing speed limits reduces speed variance. Although the estimated increases in mean vehicle speeds may appear modest, prior research suggests such increases would be associated with substantial increases in fatal or injury crashes. This should be considered by lawmakers considering increasing speed limits. Copyright © 2017 Elsevier Ltd and National Safety Council. All rights reserved.
Estimating individual glomerular volume in the human kidney: clinical perspectives
Puelles, Victor G.; Zimanyi, Monika A.; Samuel, Terence; Hughson, Michael D.; Douglas-Denton, Rebecca N.; Bertram, John F.
2012-01-01
Background. Measurement of individual glomerular volumes (IGV) has allowed the identification of drivers of glomerular hypertrophy in subjects without overt renal pathology. This study aims to highlight the relevance of IGV measurements with possible clinical implications and determine how many profiles must be measured in order to achieve stable size distribution estimates. Methods. We re-analysed 2250 IGV estimates obtained using the disector/Cavalieri method in 41 African and 34 Caucasian Americans. Pooled IGV analysis of mean and variance was conducted. Monte-Carlo (Jackknife) simulations determined the effect of the number of sampled glomeruli on mean IGV. Lin’s concordance coefficient (RC), coefficient of variation (CV) and coefficient of error (CE) measured reliability. Results. IGV mean and variance increased with overweight and hypertensive status. Superficial glomeruli were significantly smaller than juxtamedullary glomeruli in all subjects (P < 0.01), by race (P < 0.05) and in obese individuals (P < 0.01). Subjects with multiple chronic kidney disease (CKD) comorbidities showed significant increases in IGV mean and variability. Overall, mean IGV was particularly reliable with nine or more sampled glomeruli (RC > 0.95, <5% difference in CV and CE). These observations were not affected by a reduced sample size and did not disrupt the inverse linear correlation between mean IGV and estimated total glomerular number. Conclusions. Multiple comorbidities for CKD are associated with increased IGV mean and variance within subjects, including overweight, obesity and hypertension. Zonal selection and the number of sampled glomeruli do not represent drawbacks for future longitudinal biopsy-based studies of glomerular size and distribution. PMID:21984554
Hierarchical Bayes approach for subgroup analysis.
Hsu, Yu-Yi; Zalkikar, Jyoti; Tiwari, Ram C
2017-01-01
In clinical data analysis, both treatment effect estimation and consistency assessment are important for a better understanding of the drug efficacy for the benefit of subjects in individual subgroups. The linear mixed-effects model has been used for subgroup analysis to describe treatment differences among subgroups with great flexibility. The hierarchical Bayes approach has been applied to linear mixed-effects model to derive the posterior distributions of overall and subgroup treatment effects. In this article, we discuss the prior selection for variance components in hierarchical Bayes, estimation and decision making of the overall treatment effect, as well as consistency assessment of the treatment effects across the subgroups based on the posterior predictive p-value. Decision procedures are suggested using either the posterior probability or the Bayes factor. These decision procedures and their properties are illustrated using a simulated example with normally distributed response and repeated measurements.
Comparing Mapped Plot Estimators
Paul C. Van Deusen
2006-01-01
Two alternative derivations of estimators for mean and variance from mapped plots are compared by considering the models that support the estimators and by simulation. It turns out that both models lead to the same estimator for the mean but lead to very different variance estimators. The variance estimators based on the least valid model assumptions are shown to...
Segal, N L; Feng, R; McGuire, S A; Allison, D B; Miller, S
2009-01-01
Earlier studies have established that a substantial percentage of variance in obesity-related phenotypes is explained by genetic components. However, only one study has used both virtual twins (VTs) and biological twins and was able to simultaneously estimate additive genetic, non-additive genetic, shared environmental and unshared environmental components in body mass index (BMI). Our current goal was to re-estimate four components of variance in BMI, applying a more rigorous model to biological and virtual multiples with additional data. Virtual multiples share the same family environment, offering unique opportunities to estimate common environmental influence on phenotypes that cannot be separated from the non-additive genetic component using only biological multiples. Data included 929 individuals from 164 monozygotic twin pairs, 156 dizygotic twin pairs, five triplet sets, one quadruplet set, 128 VT pairs, two virtual triplet sets and two virtual quadruplet sets. Virtual multiples consist of one biological child (or twins or triplets) plus one same-aged adoptee who are all raised together since infancy. We estimated the additive genetic, non-additive genetic, shared environmental and unshared random components in BMI using a linear mixed model. The analysis was adjusted for age, age(2), age(3), height, height(2), height(3), gender and race. Both non-additive genetic and common environmental contributions were significant in our model (P-values<0.0001). No significant additive genetic contribution was found. In all, 63.6% (95% confidence interval (CI) 51.8-75.3%) of the total variance of BMI was explained by a non-additive genetic component, 25.7% (95% CI 13.8-37.5%) by a common environmental component and the remaining 10.7% by an unshared component. Our results suggest that genetic components play an essential role in BMI and that common environmental factors such as diet or exercise also affect BMI. This conclusion is consistent with our earlier study using a smaller sample and shows the utility of virtual multiples for separating non-additive genetic variance from common environmental variance.
M-estimator for the 3D symmetric Helmert coordinate transformation
NASA Astrophysics Data System (ADS)
Chang, Guobin; Xu, Tianhe; Wang, Qianxin
2018-01-01
The M-estimator for the 3D symmetric Helmert coordinate transformation problem is developed. Small-angle rotation assumption is abandoned. The direction cosine matrix or the quaternion is used to represent the rotation. The 3 × 1 multiplicative error vector is defined to represent the rotation estimation error. An analytical solution can be employed to provide the initial approximate for iteration, if the outliers are not large. The iteration is carried out using the iterative reweighted least-squares scheme. In each iteration after the first one, the measurement equation is linearized using the available parameter estimates, the reweighting matrix is constructed using the residuals obtained in the previous iteration, and then the parameter estimates with their variance-covariance matrix are calculated. The influence functions of a single pseudo-measurement on the least-squares estimator and on the M-estimator are derived to theoretically show the robustness. In the solution process, the parameter is rescaled in order to improve the numerical stability. Monte Carlo experiments are conducted to check the developed method. Different cases to investigate whether the assumed stochastic model is correct are considered. The results with the simulated data slightly deviating from the true model are used to show the developed method's statistical efficacy at the assumed stochastic model, its robustness against the deviations from the assumed stochastic model, and the validity of the estimated variance-covariance matrix no matter whether the assumed stochastic model is correct or not.
Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger
2017-09-01
The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).
Quadratic semiparametric Von Mises calculus
Robins, James; Li, Lingling; Tchetgen, Eric
2009-01-01
We discuss a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on U-statistics constructed from quadratic influence functions. The latter extend ordinary linear influence functions of the parameter of interest as defined in semiparametric theory, and represent second order derivatives of this parameter. For parameters for which the matching cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than n–1/2-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at n–1/2-rate. PMID:23087487
Fast computation of an optimal controller for large-scale adaptive optics.
Massioni, Paolo; Kulcsár, Caroline; Raynaud, Henri-François; Conan, Jean-Marc
2011-11-01
The linear quadratic Gaussian regulator provides the minimum-variance control solution for a linear time-invariant system. For adaptive optics (AO) applications, under the hypothesis of a deformable mirror with instantaneous response, such a controller boils down to a minimum-variance phase estimator (a Kalman filter) and a projection onto the mirror space. The Kalman filter gain can be computed by solving an algebraic Riccati matrix equation, whose computational complexity grows very quickly with the size of the telescope aperture. This "curse of dimensionality" makes the standard solvers for Riccati equations very slow in the case of extremely large telescopes. In this article, we propose a way of computing the Kalman gain for AO systems by means of an approximation that considers the turbulence phase screen as the cropped version of an infinite-size screen. We demonstrate the advantages of the methods for both off- and on-line computational time, and we evaluate its performance for classical AO as well as for wide-field tomographic AO with multiple natural guide stars. Simulation results are reported.
Arribas-Gil, Ana; De la Cruz, Rolando; Lebarbier, Emilie; Meza, Cristian
2015-06-01
We propose a classification method for longitudinal data. The Bayes classifier is classically used to determine a classification rule where the underlying density in each class needs to be well modeled and estimated. This work is motivated by a real dataset of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. The proposed model, which is a semiparametric linear mixed-effects model (SLMM), is a particular case of the semiparametric nonlinear mixed-effects class of models (SNMM) in which finite dimensional (fixed effects and variance components) and infinite dimensional (an unknown function) parameters have to be estimated. In SNMM's maximum likelihood estimation is performed iteratively alternating parametric and nonparametric procedures. However, if one can make the assumption that the random effects and the unknown function interact in a linear way, more efficient estimation methods can be used. Our contribution is the proposal of a unified estimation procedure based on a penalized EM-type algorithm. The Expectation and Maximization steps are explicit. In this latter step, the unknown function is estimated in a nonparametric fashion using a lasso-type procedure. A simulation study and an application on real data are performed. © 2015, The International Biometric Society.
Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R
2014-01-01
Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.
NASA Astrophysics Data System (ADS)
Rock, N. M. S.; Duffy, T. R.
REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.
Whose IQ is it?--Assessor bias variance in high-stakes psychological assessment.
McDermott, Paul A; Watkins, Marley W; Rhoad, Anna M
2014-03-01
Assessor bias variance exists for a psychological measure when some appreciable portion of the score variation that is assumed to reflect examinees' individual differences (i.e., the relevant phenomena in most psychological assessments) instead reflects differences among the examiners who perform the assessment. Ordinary test reliability estimates and standard errors of measurement do not inherently encompass assessor bias variance. This article reports on the application of multilevel linear modeling to examine the presence and extent of assessor bias in the administration of the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) for a sample of 2,783 children evaluated by 448 regional school psychologists for high-stakes special education classification purposes. It was found that nearly all WISC-IV scores conveyed significant and nontrivial amounts of variation that had nothing to do with children's actual individual differences and that the Full Scale IQ and Verbal Comprehension Index scores evidenced quite substantial assessor bias. Implications are explored. 2014 APA
NASA Astrophysics Data System (ADS)
Codis, Sandrine; Bernardeau, Francis; Pichon, Christophe
2016-08-01
In order to quantify the error budget in the measured probability distribution functions of cell densities, the two-point statistics of cosmic densities in concentric spheres is investigated. Bias functions are introduced as the ratio of their two-point correlation function to the two-point correlation of the underlying dark matter distribution. They describe how cell densities are spatially correlated. They are computed here via the so-called large deviation principle in the quasi-linear regime. Their large-separation limit is presented and successfully compared to simulations for density and density slopes: this regime is shown to be rapidly reached allowing to get sub-percent precision for a wide range of densities and variances. The corresponding asymptotic limit provides an estimate of the cosmic variance of standard concentric cell statistics applied to finite surveys. More generally, no assumption on the separation is required for some specific moments of the two-point statistics, for instance when predicting the generating function of cumulants containing any powers of concentric densities in one location and one power of density at some arbitrary distance from the rest. This exact `one external leg' cumulant generating function is used in particular to probe the rate of convergence of the large-separation approximation.
Su, Guosheng; Christensen, Ole F.; Ostersen, Tage; Henryon, Mark; Lund, Mogens S.
2012-01-01
Non-additive genetic variation is usually ignored when genome-wide markers are used to study the genetic architecture and genomic prediction of complex traits in human, wild life, model organisms or farm animals. However, non-additive genetic effects may have an important contribution to total genetic variation of complex traits. This study presented a genomic BLUP model including additive and non-additive genetic effects, in which additive and non-additive genetic relation matrices were constructed from information of genome-wide dense single nucleotide polymorphism (SNP) markers. In addition, this study for the first time proposed a method to construct dominance relationship matrix using SNP markers and demonstrated it in detail. The proposed model was implemented to investigate the amounts of additive genetic, dominance and epistatic variations, and assessed the accuracy and unbiasedness of genomic predictions for daily gain in pigs. In the analysis of daily gain, four linear models were used: 1) a simple additive genetic model (MA), 2) a model including both additive and additive by additive epistatic genetic effects (MAE), 3) a model including both additive and dominance genetic effects (MAD), and 4) a full model including all three genetic components (MAED). Estimates of narrow-sense heritability were 0.397, 0.373, 0.379 and 0.357 for models MA, MAE, MAD and MAED, respectively. Estimated dominance variance and additive by additive epistatic variance accounted for 5.6% and 9.5% of the total phenotypic variance, respectively. Based on model MAED, the estimate of broad-sense heritability was 0.506. Reliabilities of genomic predicted breeding values for the animals without performance records were 28.5%, 28.8%, 29.2% and 29.5% for models MA, MAE, MAD and MAED, respectively. In addition, models including non-additive genetic effects improved unbiasedness of genomic predictions. PMID:23028912
NASA Astrophysics Data System (ADS)
Shimizu, K.; von Storch, J. S.; Haak, H.; Nakayama, K.; Marotzke, J.
2014-12-01
Surface wind stress is considered to be an important forcing of the seasonal and interannual variability of Atlantic Meridional Overturning Circulation (AMOC) volume transports. A recent study showed that even linear response to wind forcing captures observed features of the mean seasonal cycle. However, the study did not assess the contribution of wind-driven linear response in realistic conditions against the RAPID/MOCHA array observation or Ocean General Circulation Model (OGCM) simulations, because it applied a linear two-layer model to the Atlantic assuming constant upper layer thickness and density difference across the interface. Here, we quantify the contribution of wind-driven linear response to the seasonal and interannual variability of AMOC transports by comparing wind-driven linear simulations under realistic continuous stratification against the RAPID observation and OCGM (MPI-OM) simulations with 0.4º resolution (TP04) and 0.1º resolution (STORM). All the linear and MPI-OM simulations capture more than 60% of the variance in the observed mean seasonal cycle of the Upper Mid-Ocean (UMO) and Florida Strait (FS) transports, two components of the upper branch of the AMOC. The linear and TP04 simulations also capture 25-40% of the variance in the observed transport time series between Apr 2004 and Oct 2012; the STORM simulation does not capture the observed variance because of the stochastic signal in both datasets. Comparison of half-overlapping 12-month-long segments reveals some periods when the linear and TP04 simulations capture 40-60% of the observed variance, as well as other periods when the simulations capture only 0-20% of the variance. These results show that wind-driven linear response is a major contributor to the seasonal and interannual variability of the UMO and FS transports, and that its contribution varies in an interannual timescale, probably due to the variability of stochastic processes.
1984-05-01
By means of the concept of change-of variance function we investigate the stability properties of the asymptotic variance of R-estimators. This allows us to construct the optimal V-robust R-estimator that minimizes the asymptotic variance at the model, under the side condition of a bounded change-of variance function. Finally, we discuss the connection between this function and an influence function for two-sample rank tests introduced by Eplett (1980). (Author)
Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification.
Schisterman, Enrique F; Perkins, Neil J; Mumford, Sunni L; Ahrens, Katherine A; Mitchell, Emily M
2017-01-01
Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.
Reference tissue modeling with parameter coupling: application to a study of SERT binding in HIV
NASA Astrophysics Data System (ADS)
Endres, Christopher J.; Hammoud, Dima A.; Pomper, Martin G.
2011-04-01
When applicable, it is generally preferred to evaluate positron emission tomography (PET) studies using a reference tissue-based approach as that avoids the need for invasive arterial blood sampling. However, most reference tissue methods have been shown to have a bias that is dependent on the level of tracer binding, and the variability of parameter estimates may be substantially affected by noise level. In a study of serotonin transporter (SERT) binding in HIV dementia, it was determined that applying parameter coupling to the simplified reference tissue model (SRTM) reduced the variability of parameter estimates and yielded the strongest between-group significant differences in SERT binding. The use of parameter coupling makes the application of SRTM more consistent with conventional blood input models and reduces the total number of fitted parameters, thus should yield more robust parameter estimates. Here, we provide a detailed evaluation of the application of parameter constraint and parameter coupling to [11C]DASB PET studies. Five quantitative methods, including three methods that constrain the reference tissue clearance (kr2) to a common value across regions were applied to the clinical and simulated data to compare measurement of the tracer binding potential (BPND). Compared with standard SRTM, either coupling of kr2 across regions or constraining kr2 to a first-pass estimate improved the sensitivity of SRTM to measuring a significant difference in BPND between patients and controls. Parameter coupling was particularly effective in reducing the variance of parameter estimates, which was less than 50% of the variance obtained with standard SRTM. A linear approach was also improved when constraining kr2 to a first-pass estimate, although the SRTM-based methods yielded stronger significant differences when applied to the clinical study. This work shows that parameter coupling reduces the variance of parameter estimates and may better discriminate between-group differences in specific binding.
A flexible model for the mean and variance functions, with application to medical cost data.
Chen, Jinsong; Liu, Lei; Zhang, Daowen; Shih, Ya-Chen T
2013-10-30
Medical cost data are often skewed to the right and heteroscedastic, having a nonlinear relation with covariates. To tackle these issues, we consider an extension to generalized linear models by assuming nonlinear associations of covariates in the mean function and allowing the variance to be an unknown but smooth function of the mean. We make no further assumption on the distributional form. The unknown functions are described by penalized splines, and the estimation is carried out using nonparametric quasi-likelihood. Simulation studies show the flexibility and advantages of our approach. We apply the model to the annual medical costs of heart failure patients in the clinical data repository at the University of Virginia Hospital System. Copyright © 2013 John Wiley & Sons, Ltd.
Mixed model approaches for diallel analysis based on a bio-model.
Zhu, J; Weir, B S
1996-12-01
A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.
Inference on periodicity of circadian time series.
Costa, Maria J; Finkenstädt, Bärbel; Roche, Véronique; Lévi, Francis; Gould, Peter D; Foreman, Julia; Halliday, Karen; Hall, Anthony; Rand, David A
2013-09-01
Estimation of the period length of time-course data from cyclical biological processes, such as those driven by the circadian pacemaker, is crucial for inferring the properties of the biological clock found in many living organisms. We propose a methodology for period estimation based on spectrum resampling (SR) techniques. Simulation studies show that SR is superior and more robust to non-sinusoidal and noisy cycles than a currently used routine based on Fourier approximations. In addition, a simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice. The proposed methods are motivated by and applied to various data examples from chronobiology.
NASA Astrophysics Data System (ADS)
Brown, James; Seo, Dong-Jun
2010-05-01
Operational forecasts of hydrometeorological and hydrologic variables often contain large uncertainties, for which ensemble techniques are increasingly used. However, the utility of ensemble forecasts depends on the unbiasedness of the forecast probabilities. We describe a technique for quantifying and removing biases from ensemble forecasts of hydrometeorological and hydrologic variables, intended for use in operational forecasting. The technique makes no a priori assumptions about the distributional form of the variables, which is often unknown or difficult to model parametrically. The aim is to estimate the conditional cumulative distribution function (ccdf) of the observed variable given a (possibly biased) real-time ensemble forecast from one or several forecasting systems (multi-model ensembles). The technique is based on Bayesian optimal linear estimation of indicator variables, and is analogous to indicator cokriging (ICK) in geostatistics. By developing linear estimators for the conditional expectation of the observed variable at many thresholds, ICK provides a discrete approximation of the full ccdf. Since ICK minimizes the conditional error variance of the indicator expectation at each threshold, it effectively minimizes the Continuous Ranked Probability Score (CRPS) when infinitely many thresholds are employed. However, the ensemble members used as predictors in ICK, and other bias-correction techniques, are often highly cross-correlated, both within and between models. Thus, we propose an orthogonal transform of the predictors used in ICK, which is analogous to using their principal components in the linear system of equations. This leads to a well-posed problem in which a minimum number of predictors are used to provide maximum information content in terms of the total variance explained. The technique is used to bias-correct precipitation ensemble forecasts from the NCEP Global Ensemble Forecast System (GEFS), for which independent validation results are presented. Extension to multimodel ensembles from the NCEP GFS and Short Range Ensemble Forecast (SREF) systems is also proposed.
Linear mixed-effects modeling approach to FMRI group analysis.
Chen, Gang; Saad, Ziad S; Britton, Jennifer C; Pine, Daniel S; Cox, Robert W
2013-06-01
Conventional group analysis is usually performed with Student-type t-test, regression, or standard AN(C)OVA in which the variance-covariance matrix is presumed to have a simple structure. Some correction approaches are adopted when assumptions about the covariance structure is violated. However, as experiments are designed with different degrees of sophistication, these traditional methods can become cumbersome, or even be unable to handle the situation at hand. For example, most current FMRI software packages have difficulty analyzing the following scenarios at group level: (1) taking within-subject variability into account when there are effect estimates from multiple runs or sessions; (2) continuous explanatory variables (covariates) modeling in the presence of a within-subject (repeated measures) factor, multiple subject-grouping (between-subjects) factors, or the mixture of both; (3) subject-specific adjustments in covariate modeling; (4) group analysis with estimation of hemodynamic response (HDR) function by multiple basis functions; (5) various cases of missing data in longitudinal studies; and (6) group studies involving family members or twins. Here we present a linear mixed-effects modeling (LME) methodology that extends the conventional group analysis approach to analyze many complicated cases, including the six prototypes delineated above, whose analyses would be otherwise either difficult or unfeasible under traditional frameworks such as AN(C)OVA and general linear model (GLM). In addition, the strength of the LME framework lies in its flexibility to model and estimate the variance-covariance structures for both random effects and residuals. The intraclass correlation (ICC) values can be easily obtained with an LME model with crossed random effects, even at the presence of confounding fixed effects. The simulations of one prototypical scenario indicate that the LME modeling keeps a balance between the control for false positives and the sensitivity for activation detection. The importance of hypothesis formulation is also illustrated in the simulations. Comparisons with alternative group analysis approaches and the limitations of LME are discussed in details. Published by Elsevier Inc.
Marjanovic, Jovana; Mulder, Han A; Khaw, Hooi L; Bijma, Piter
2016-06-10
Animal breeding programs have been very successful in improving the mean levels of traits through selection. However, in recent decades, reducing the variability of trait levels between individuals has become a highly desirable objective. Reaching this objective through genetic selection requires that there is genetic variation in the variability of trait levels, a phenomenon known as genetic heterogeneity of environmental (residual) variance. The aim of our study was to investigate the potential for genetic improvement of uniformity of harvest weight and body size traits (length, depth, and width) in the genetically improved farmed tilapia (GIFT) strain. In order to quantify the genetic variation in uniformity of traits and estimate the genetic correlations between level and variance of the traits, double hierarchical generalized linear models were applied to individual trait values. Our results showed substantial genetic variation in uniformity of all analyzed traits, with genetic coefficients of variation for residual variance ranging from 39 to 58 %. Genetic correlation between trait level and variance was strongly positive for harvest weight (0.60 ± 0.09), moderate and positive for body depth (0.37 ± 0.13), but not significantly different from 0 for body length and width. Our results on the genetic variation in uniformity of harvest weight and body size traits show good prospects for the genetic improvement of uniformity in the GIFT strain. A high and positive genetic correlation was estimated between level and variance of harvest weight, which suggests that selection for heavier fish will also result in more variation in harvest weight. Simultaneous improvement of harvest weight and its uniformity will thus require index selection.
Using variance structure to quantify responses to perturbation in fish catches
Vidal, Tiffany E.; Irwin, Brian J.; Wagner, Tyler; Rudstam, Lars G.; Jackson, James R.; Bence, James R.
2017-01-01
We present a case study evaluation of gill-net catches of Walleye Sander vitreus to assess potential effects of large-scale changes in Oneida Lake, New York, including the disruption of trophic interactions by double-crested cormorants Phalacrocorax auritus and invasive dreissenid mussels. We used the empirical long-term gill-net time series and a negative binomial linear mixed model to partition the variability in catches into spatial and coherent temporal variance components, hypothesizing that variance partitioning can help quantify spatiotemporal variability and determine whether variance structure differs before and after large-scale perturbations. We found that the mean catch and the total variability of catches decreased following perturbation but that not all sampling locations responded in a consistent manner. There was also evidence of some spatial homogenization concurrent with a restructuring of the relative productivity of individual sites. Specifically, offshore sites generally became more productive following the estimated break point in the gill-net time series. These results provide support for the idea that variance structure is responsive to large-scale perturbations; therefore, variance components have potential utility as statistical indicators of response to a changing environment more broadly. The modeling approach described herein is flexible and would be transferable to other systems and metrics. For example, variance partitioning could be used to examine responses to alternative management regimes, to compare variability across physiographic regions, and to describe differences among climate zones. Understanding how individual variance components respond to perturbation may yield finer-scale insights into ecological shifts than focusing on patterns in the mean responses or total variability alone.
2013-01-01
Background Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the ‘common variance’ assumption). This approach ‘borrows strength’ for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary. Methods In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities. Results In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach. Conclusions MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice. PMID:23311298
Chaudhuri, Shomesh E; Merfeld, Daniel M
2013-03-01
Psychophysics generally relies on estimating a subject's ability to perform a specific task as a function of an observed stimulus. For threshold studies, the fitted functions are called psychometric functions. While fitting psychometric functions to data acquired using adaptive sampling procedures (e.g., "staircase" procedures), investigators have encountered a bias in the spread ("slope" or "threshold") parameter that has been attributed to the serial dependency of the adaptive data. Using simulations, we confirm this bias for cumulative Gaussian parametric maximum likelihood fits on data collected via adaptive sampling procedures, and then present a bias-reduced maximum likelihood fit that substantially reduces the bias without reducing the precision of the spread parameter estimate and without reducing the accuracy or precision of the other fit parameters. As a separate topic, we explain how to implement this bias reduction technique using generalized linear model fits as well as other numeric maximum likelihood techniques such as the Nelder-Mead simplex. We then provide a comparison of the iterative bootstrap and observed information matrix techniques for estimating parameter fit variance from adaptive sampling procedure data sets. The iterative bootstrap technique is shown to be slightly more accurate; however, the observed information technique executes in a small fraction (0.005 %) of the time required by the iterative bootstrap technique, which is an advantage when a real-time estimate of parameter fit variance is required.
Westine, Carl D; Spybrook, Jessaca; Taylor, Joseph A
2013-12-01
Prior research has focused primarily on empirically estimating design parameters for cluster-randomized trials (CRTs) of mathematics and reading achievement. Little is known about how design parameters compare across other educational outcomes. This article presents empirical estimates of design parameters that can be used to appropriately power CRTs in science education and compares them to estimates using mathematics and reading. Estimates of intraclass correlations (ICCs) are computed for unconditional two-level (students in schools) and three-level (students in schools in districts) hierarchical linear models of science achievement. Relevant student- and school-level pretest and demographic covariates are then considered, and estimates of variance explained are computed. Subjects: Five consecutive years of Texas student-level data for Grades 5, 8, 10, and 11. Science, mathematics, and reading achievement raw scores as measured by the Texas Assessment of Knowledge and Skills. Results: Findings show that ICCs in science range from .172 to .196 across grades and are generally higher than comparable statistics in mathematics, .163-.172, and reading, .099-.156. When available, a 1-year lagged student-level science pretest explains the most variability in the outcome. The 1-year lagged school-level science pretest is the best alternative in the absence of a 1-year lagged student-level science pretest. Science educational researchers should utilize design parameters derived from science achievement outcomes. © The Author(s) 2014.
Murad, Havi; Kipnis, Victor; Freedman, Laurence S
2016-10-01
Assessing interactions in linear regression models when covariates have measurement error (ME) is complex.We previously described regression calibration (RC) methods that yield consistent estimators and standard errors for interaction coefficients of normally distributed covariates having classical ME. Here we extend normal based RC (NBRC) and linear RC (LRC) methods to a non-classical ME model, and describe more efficient versions that combine estimates from the main study and internal sub-study. We apply these methods to data from the Observing Protein and Energy Nutrition (OPEN) study. Using simulations we show that (i) for normally distributed covariates efficient NBRC and LRC were nearly unbiased and performed well with sub-study size ≥200; (ii) efficient NBRC had lower MSE than efficient LRC; (iii) the naïve test for a single interaction had type I error probability close to the nominal significance level, whereas efficient NBRC and LRC were slightly anti-conservative but more powerful; (iv) for markedly non-normal covariates, efficient LRC yielded less biased estimators with smaller variance than efficient NBRC. Our simulations suggest that it is preferable to use: (i) efficient NBRC for estimating and testing interaction effects of normally distributed covariates and (ii) efficient LRC for estimating and testing interactions for markedly non-normal covariates. © The Author(s) 2013.
Switzer, P.; Harden, J.W.; Mark, R.K.
1988-01-01
A statistical method for estimating rates of soil development in a given region based on calibration from a series of dated soils is used to estimate ages of soils in the same region that are not dated directly. The method is designed specifically to account for sampling procedures and uncertainties that are inherent in soil studies. Soil variation and measurement error, uncertainties in calibration dates and their relation to the age of the soil, and the limited number of dated soils are all considered. Maximum likelihood (ML) is employed to estimate a parametric linear calibration curve, relating soil development to time or age on suitably transformed scales. Soil variation on a geomorphic surface of a certain age is characterized by replicate sampling of soils on each surface; such variation is assumed to have a Gaussian distribution. The age of a geomorphic surface is described by older and younger bounds. This technique allows age uncertainty to be characterized by either a Gaussian distribution or by a triangular distribution using minimum, best-estimate, and maximum ages. The calibration curve is taken to be linear after suitable (in certain cases logarithmic) transformations, if required, of the soil parameter and age variables. Soil variability, measurement error, and departures from linearity are described in a combined fashion using Gaussian distributions with variances particular to each sampled geomorphic surface and the number of sample replicates. Uncertainty in age of a geomorphic surface used for calibration is described using three parameters by one of two methods. In the first method, upper and lower ages are specified together with a coverage probability; this specification is converted to a Gaussian distribution with the appropriate mean and variance. In the second method, "absolute" older and younger ages are specified together with a most probable age; this specification is converted to an asymmetric triangular distribution with mode at the most probable age. The statistical variability of the ML-estimated calibration curve is assessed by a Monte Carlo method in which simulated data sets repeatedly are drawn from the distributional specification; calibration parameters are reestimated for each such simulation in order to assess their statistical variability. Several examples are used for illustration. The age of undated soils in a related setting may be estimated from the soil data using the fitted calibration curve. A second simulation to assess age estimate variability is described and applied to the examples. ?? 1988 International Association for Mathematical Geology.
The measurement of linear frequency drift in oscillators
NASA Astrophysics Data System (ADS)
Barnes, J. A.
1985-04-01
A linear drift in frequency is an important element in most stochastic models of oscillator performance. Quartz crystal oscillators often have drifts in excess of a part in ten to the tenth power per day. Even commercial cesium beam devices often show drifts of a few parts in ten to the thirteenth per year. There are many ways to estimate the drift rates from data samples (e.g., regress the phase on a quadratic; regress the frequency on a linear; compute the simple mean of the first difference of frequency; use Kalman filters with a drift term as one element in the state vector; and others). Although most of these estimators are unbiased, they vary in efficiency (i.e., confidence intervals). Further, the estimation of confidence intervals using the standard analysis of variance (typically associated with the specific estimating technique) can give amazingly optimistic results. The source of these problems is not an error in, say, the regressions techniques, but rather the problems arise from correlations within the residuals. That is, the oscillator model is often not consistent with constraints on the analysis technique or, in other words, some specific analysis techniques are often inappropriate for the task at hand. The appropriateness of a specific analysis technique is critically dependent on the oscillator model and can often be checked with a simple whiteness test on the residuals.
Direct and indirect genetic and fine-scale location effects on breeding date in song sparrows.
Germain, Ryan R; Wolak, Matthew E; Arcese, Peter; Losdat, Sylvain; Reid, Jane M
2016-11-01
Quantifying direct and indirect genetic effects of interacting females and males on variation in jointly expressed life-history traits is central to predicting microevolutionary dynamics. However, accurately estimating sex-specific additive genetic variances in such traits remains difficult in wild populations, especially if related individuals inhabit similar fine-scale environments. Breeding date is a key life-history trait that responds to environmental phenology and mediates individual and population responses to environmental change. However, no studies have estimated female (direct) and male (indirect) additive genetic and inbreeding effects on breeding date, and estimated the cross-sex genetic correlation, while simultaneously accounting for fine-scale environmental effects of breeding locations, impeding prediction of microevolutionary dynamics. We fitted animal models to 38 years of song sparrow (Melospiza melodia) phenology and pedigree data to estimate sex-specific additive genetic variances in breeding date, and the cross-sex genetic correlation, thereby estimating the total additive genetic variance while simultaneously estimating sex-specific inbreeding depression. We further fitted three forms of spatial animal model to explicitly estimate variance in breeding date attributable to breeding location, overlap among breeding locations and spatial autocorrelation. We thereby quantified fine-scale location variances in breeding date and quantified the degree to which estimating such variances affected the estimated additive genetic variances. The non-spatial animal model estimated nonzero female and male additive genetic variances in breeding date (sex-specific heritabilities: 0·07 and 0·02, respectively) and a strong, positive cross-sex genetic correlation (0·99), creating substantial total additive genetic variance (0·18). Breeding date varied with female, but not male inbreeding coefficient, revealing direct, but not indirect, inbreeding depression. All three spatial animal models estimated small location variance in breeding date, but because relatedness and breeding location were virtually uncorrelated, modelling location variance did not alter the estimated additive genetic variances. Our results show that sex-specific additive genetic effects on breeding date can be strongly positively correlated, which would affect any predicted rates of microevolutionary change in response to sexually antagonistic or congruent selection. Further, we show that inbreeding effects on breeding date can also be sex specific and that genetic effects can exceed phenotypic variation stemming from fine-scale location-based variation within a wild population. © 2016 The Authors. Journal of Animal Ecology © 2016 British Ecological Society.
Comparing estimates of genetic variance across different relationship models.
Legarra, Andres
2016-02-01
Use of relationships between individuals to estimate genetic variances and heritabilities via mixed models is standard practice in human, plant and livestock genetics. Different models or information for relationships may give different estimates of genetic variances. However, comparing these estimates across different relationship models is not straightforward as the implied base populations differ between relationship models. In this work, I present a method to compare estimates of variance components across different relationship models. I suggest referring genetic variances obtained using different relationship models to the same reference population, usually a set of individuals in the population. Expected genetic variance of this population is the estimated variance component from the mixed model times a statistic, Dk, which is the average self-relationship minus the average (self- and across-) relationship. For most typical models of relationships, Dk is close to 1. However, this is not true for very deep pedigrees, for identity-by-state relationships, or for non-parametric kernels, which tend to overestimate the genetic variance and the heritability. Using mice data, I show that heritabilities from identity-by-state and kernel-based relationships are overestimated. Weighting these estimates by Dk scales them to a base comparable to genomic or pedigree relationships, avoiding wrong comparisons, for instance, "missing heritabilities". Copyright © 2015 Elsevier Inc. All rights reserved.
Camarinha-Silva, Amelia; Maushammer, Maria; Wellmann, Robin; Vital, Marius; Preuss, Siegfried; Bennewitz, Jörn
2017-07-01
The aim of the present study was to analyze the interplay between gastrointestinal tract (GIT) microbiota, host genetics, and complex traits in pigs using extended quantitative-genetic methods. The study design consisted of 207 pigs that were housed and slaughtered under standardized conditions, and phenotyped for daily gain, feed intake, and feed conversion rate. The pigs were genotyped with a standard 60 K SNP chip. The GIT microbiota composition was analyzed by 16S rRNA gene amplicon sequencing technology. Eight from 49 investigated bacteria genera showed a significant narrow sense host heritability, ranging from 0.32 to 0.57. Microbial mixed linear models were applied to estimate the microbiota variance for each complex trait. The fraction of phenotypic variance explained by the microbial variance was 0.28, 0.21, and 0.16 for daily gain, feed conversion, and feed intake, respectively. The SNP data and the microbiota composition were used to predict the complex traits using genomic best linear unbiased prediction (G-BLUP) and microbial best linear unbiased prediction (M-BLUP) methods, respectively. The prediction accuracies of G-BLUP were 0.35, 0.23, and 0.20 for daily gain, feed conversion, and feed intake, respectively. The corresponding prediction accuracies of M-BLUP were 0.41, 0.33, and 0.33. Thus, in addition to SNP data, microbiota abundances are an informative source of complex trait predictions. Since the pig is a well-suited animal for modeling the human digestive tract, M-BLUP, in addition to G-BLUP, might be beneficial for predicting human predispositions to some diseases, and, consequently, for preventative and personalized medicine. Copyright © 2017 by the Genetics Society of America.
Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of the variance in the fledgling counts as climate, parent age class, and landscape habitat predictors. Our logistic quantile regression model can be used for any discrete response variables with fixed upper and lower bounds.
Nonexercise Equations to Estimate Fitness in White European and South Asian Men.
O'Donovan, Gary; Bakrania, Kishan; Ghouri, Nazim; Yates, Thomas; Gray, Laura J; Hamer, Mark; Stamatakis, Emmanuel; Khunti, Kamlesh; Davies, Melanie; Sattar, Naveed; Gill, Jason M R
2016-05-01
Cardiorespiratory fitness is a strong, independent predictor of health, whether it is measured in an exercise test or estimated in an equation. The purpose of this study was to develop and validate equations to estimate fitness in middle-age white European and South Asian men. Multiple linear regression models (n = 168, including 83 white European and 85 South Asian men) were created using variables that are thought to be important in predicting fitness (V˙O2max, mL·kg⁻¹·min⁻¹): age (yr), body mass index (kg·m⁻²), resting HR (bpm); smoking status (0, never smoked; 1, ex or current smoker), physical activity expressed as quintiles (0, quintile 1; 1, quintile 2; 2, quintile 3; 3, quintile 4; 4, quintile 5), categories of moderate- to-vigorous intensity physical activity (MVPA) (0, <75 min·wk⁻¹; 1, 75-150 min·wk⁻¹; 2, >150-225 min·wk⁻¹; 3, >225-300 min·wk⁻¹; 4, >300 min·wk⁻¹), or minutes of MVPA (min·wk⁻¹); and, ethnicity (0, South Asian; 1, white). The leave-one-out cross-validation procedure was used to assess the generalizability, and the bootstrap and jackknife resampling techniques were used to estimate the variance and bias of the models. Around 70% of the variance in fitness was explained in models with an ethnicity variable, such as: V˙O2max = 77.409 - (age × 0.374) - (body mass index × 0.906) - (ex or current smoker × 1.976) + (physical activity quintile coefficient) - (resting HR × 0.066) + (white ethnicity × 8.032), where physical activity quintile 1 is 0, 2 is 1.127, 3 is 1.869, 4 is 3.793, and 5 is 3.029. Only around 50% of the variance was explained in models without an ethnicity variable. All models with an ethnicity variable were generalizable and had low variance and bias. These data demonstrate the importance of incorporating ethnicity in nonexercise equations to estimate cardiorespiratory fitness in multiethnic populations.
ERIC Educational Resources Information Center
Oranje, Andreas
2006-01-01
A multitude of methods has been proposed to estimate the sampling variance of ratio estimates in complex samples (Wolter, 1985). Hansen and Tepping (1985) studied some of those variance estimators and found that a high coefficient of variation (CV) of the denominator of a ratio estimate is indicative of a biased estimate of the standard error of a…
USDA-ARS?s Scientific Manuscript database
We proposed a method to estimate the error variance among non-replicated genotypes, thus to estimate the genetic parameters by using replicated controls. We derived formulas to estimate sampling variances of the genetic parameters. Computer simulation indicated that the proposed methods of estimatin...
Tanner-Smith, Emily E; Tipton, Elizabeth
2014-03-01
Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and spss (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding the practical application and implementation of those macros. This paper provides a brief tutorial on the implementation of the Stata and spss macros and discusses practical issues meta-analysts should consider when estimating meta-regression models with robust variance estimates. Two example databases are used in the tutorial to illustrate the use of meta-analysis with robust variance estimates. Copyright © 2013 John Wiley & Sons, Ltd.
Generalized Linear Covariance Analysis
NASA Technical Reports Server (NTRS)
Carpenter, James R.; Markley, F. Landis
2014-01-01
This talk presents a comprehensive approach to filter modeling for generalized covariance analysis of both batch least-squares and sequential estimators. We review and extend in two directions the results of prior work that allowed for partitioning of the state space into solve-for'' and consider'' parameters, accounted for differences between the formal values and the true values of the measurement noise, process noise, and textita priori solve-for and consider covariances, and explicitly partitioned the errors into subspaces containing only the influence of the measurement noise, process noise, and solve-for and consider covariances. In this work, we explicitly add sensitivity analysis to this prior work, and relax an implicit assumption that the batch estimator's epoch time occurs prior to the definitive span. We also apply the method to an integrated orbit and attitude problem, in which gyro and accelerometer errors, though not estimated, influence the orbit determination performance. We illustrate our results using two graphical presentations, which we call the variance sandpile'' and the sensitivity mosaic,'' and we compare the linear covariance results to confidence intervals associated with ensemble statistics from a Monte Carlo analysis.
NASA Technical Reports Server (NTRS)
Lugo, Rafael A.; Tolson, Robert H.; Schoenenberger, Mark
2013-01-01
As part of the Mars Science Laboratory (MSL) trajectory reconstruction effort at NASA Langley Research Center, free-flight aeroballistic experiments of instrumented MSL scale models was conducted at Aberdeen Proving Ground in Maryland. The models carried an inertial measurement unit (IMU) and a flush air data system (FADS) similar to the MSL Entry Atmospheric Data System (MEADS) that provided data types similar to those from the MSL entry. Multiple sources of redundant data were available, including tracking radar and on-board magnetometers. These experimental data enabled the testing and validation of the various tools and methodologies that will be used for MSL trajectory reconstruction. The aerodynamic parameters Mach number, angle of attack, and sideslip angle were estimated using minimum variance with a priori to combine the pressure data and pre-flight computational fluid dynamics (CFD) data. Both linear and non-linear pressure model terms were also estimated for each pressure transducer as a measure of the errors introduced by CFD and transducer calibration. Parameter uncertainties were estimated using a "consider parameters" approach.
Austin, Peter C
2016-12-30
Propensity score methods are used to reduce the effects of observed confounding when using observational data to estimate the effects of treatments or exposures. A popular method of using the propensity score is inverse probability of treatment weighting (IPTW). When using this method, a weight is calculated for each subject that is equal to the inverse of the probability of receiving the treatment that was actually received. These weights are then incorporated into the analyses to minimize the effects of observed confounding. Previous research has found that these methods result in unbiased estimation when estimating the effect of treatment on survival outcomes. However, conventional methods of variance estimation were shown to result in biased estimates of standard error. In this study, we conducted an extensive set of Monte Carlo simulations to examine different methods of variance estimation when using a weighted Cox proportional hazards model to estimate the effect of treatment. We considered three variance estimation methods: (i) a naïve model-based variance estimator; (ii) a robust sandwich-type variance estimator; and (iii) a bootstrap variance estimator. We considered estimation of both the average treatment effect and the average treatment effect in the treated. We found that the use of a bootstrap estimator resulted in approximately correct estimates of standard errors and confidence intervals with the correct coverage rates. The other estimators resulted in biased estimates of standard errors and confidence intervals with incorrect coverage rates. Our simulations were informed by a case study examining the effect of statin prescribing on mortality. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Xu, Chonggang; Gertner, George
2013-01-01
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. PMID:24143037
Xu, Chonggang; Gertner, George
2011-01-01
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements.
Saville, Benjamin R.; Herring, Amy H.; Kaufman, Jay S.
2013-01-01
Racial/ethnic disparities in birthweight are a large source of differential morbidity and mortality worldwide and have remained largely unexplained in epidemiologic models. We assess the impact of maternal ancestry and census tract residence on infant birth weights in New York City and the modifying effects of race and nativity by incorporating random effects in a multilevel linear model. Evaluating the significance of these predictors involves the test of whether the variances of the random effects are equal to zero. This is problematic because the null hypothesis lies on the boundary of the parameter space. We generalize an approach for assessing random effects in the two-level linear model to a broader class of multilevel linear models by scaling the random effects to the residual variance and introducing parameters that control the relative contribution of the random effects. After integrating over the random effects and variance components, the resulting integrals needed to calculate the Bayes factor can be efficiently approximated with Laplace’s method. PMID:24082430
NASA Astrophysics Data System (ADS)
Dai, Xiaoqian; Tian, Jie; Chen, Zhe
2010-03-01
Parametric images can represent both spatial distribution and quantification of the biological and physiological parameters of tracer kinetics. The linear least square (LLS) method is a well-estimated linear regression method for generating parametric images by fitting compartment models with good computational efficiency. However, bias exists in LLS-based parameter estimates, owing to the noise present in tissue time activity curves (TTACs) that propagates as correlated error in the LLS linearized equations. To address this problem, a volume-wise principal component analysis (PCA) based method is proposed. In this method, firstly dynamic PET data are properly pre-transformed to standardize noise variance as PCA is a data driven technique and can not itself separate signals from noise. Secondly, the volume-wise PCA is applied on PET data. The signals can be mostly represented by the first few principle components (PC) and the noise is left in the subsequent PCs. Then the noise-reduced data are obtained using the first few PCs by applying 'inverse PCA'. It should also be transformed back according to the pre-transformation method used in the first step to maintain the scale of the original data set. Finally, the obtained new data set is used to generate parametric images using the linear least squares (LLS) estimation method. Compared with other noise-removal method, the proposed method can achieve high statistical reliability in the generated parametric images. The effectiveness of the method is demonstrated both with computer simulation and with clinical dynamic FDG PET study.
Hays, Ron D; Revicki, Dennis A; Feeny, David; Fayers, Peter; Spritzer, Karen L; Cella, David
2016-10-01
Preference-based health-related quality of life (HR-QOL) scores are useful as outcome measures in clinical studies, for monitoring the health of populations, and for estimating quality-adjusted life-years. This was a secondary analysis of data collected in an internet survey as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)) project. To estimate Health Utilities Index Mark 3 (HUI-3) preference scores, we used the ten PROMIS(®) global health items, the PROMIS-29 V2.0 single pain intensity item and seven multi-item scales (physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, sleep disturbance), and the PROMIS-29 V2.0 items. Linear regression analyses were used to identify significant predictors, followed by simple linear equating to avoid regression to the mean. The regression models explained 48 % (global health items), 61 % (PROMIS-29 V2.0 scales), and 64 % (PROMIS-29 V2.0 items) of the variance in the HUI-3 preference score. Linear equated scores were similar to observed scores, although differences tended to be larger for older study participants. HUI-3 preference scores can be estimated from the PROMIS(®) global health items or PROMIS-29 V2.0. The estimated HUI-3 scores from the PROMIS(®) health measures can be used for economic applications and as a measure of overall HR-QOL in research.
Radar modulation classification using time-frequency representation and nonlinear regression
NASA Astrophysics Data System (ADS)
De Luigi, Christophe; Arques, Pierre-Yves; Lopez, Jean-Marc; Moreau, Eric
1999-09-01
In naval electronic environment, pulses emitted by radars are collected by ESM receivers. For most of them the intrapulse signal is modulated by a particular law. To help the classical identification process, a classification and estimation of this modulation law is applied on the intrapulse signal measurements. To estimate with a good accuracy the time-varying frequency of a signal corrupted by an additive noise, one method has been chosen. This method consists on the Wigner distribution calculation, the instantaneous frequency is then estimated by the peak location of the distribution. Bias and variance of the estimator are performed by computed simulations. In a estimated sequence of frequencies, we assume the presence of false and good estimated ones, the hypothesis of Gaussian distribution is made on the errors. A robust non linear regression method, based on the Levenberg-Marquardt algorithm, is thus applied on these estimated frequencies using a Maximum Likelihood Estimator. The performances of the method are tested by using varied modulation laws and different signal to noise ratios.
Diallel analysis for sex-linked and maternal effects.
Zhu, J; Weir, B S
1996-01-01
Genetic models including sex-linked and maternal effects as well as autosomal gene effects are described. Monte Carlo simulations were conducted to compare efficiencies of estimation by minimum norm quadratic unbiased estimation (MINQUE) and restricted maximum likelihood (REML) methods. MINQUE(1), which has 1 for all prior values, has a similar efficiency to MINQUE(θ), which requires prior estimates of parameter values. MINQUE(1) has the advantage over REML of unbiased estimation and convenient computation. An adjusted unbiased prediction (AUP) method is developed for predicting random genetic effects. AUP is desirable for its easy computation and unbiasedness of both mean and variance of predictors. The jackknife procedure is appropriate for estimating the sampling variances of estimated variances (or covariances) and of predicted genetic effects. A t-test based on jackknife variances is applicable for detecting significance of variation. Worked examples from mice and silkworm data are given in order to demonstrate variance and covariance estimation and genetic effect prediction.
Alghanim, Hussain; Antunes, Joana; Silva, Deborah Soares Bispo Santos; Alho, Clarice Sampaio; Balamurugan, Kuppareddi; McCord, Bruce
2017-11-01
Recent developments in the analysis of epigenetic DNA methylation patterns have demonstrated that certain genetic loci show a linear correlation with chronological age. It is the goal of this study to identify a new set of epigenetic methylation markers for the forensic estimation of human age. A total number of 27 CpG sites at three genetic loci, SCGN, DLX5 and KLF14, were examined to evaluate the correlation of their methylation status with age. These sites were evaluated using 72 blood samples and 91 saliva samples collected from volunteers with ages ranging from 5 to 73 years. DNA was bisulfite modified followed by PCR amplification and pyrosequencing to determine the level of DNA methylation at each CpG site. In this study, certain CpG sites in SCGN and KLF14 loci showed methylation levels that were correlated with chronological age, however, the tested CpG sites in DLX5 did not show a correlation with age. Using a 52-saliva sample training set, two age-predictor models were developed by means of a multivariate linear regression analysis for age prediction. The two models performed similarly with a single-locus model explaining 85% of the age variance at a mean absolute deviation of 5.8 years and a dual-locus model explaining 84% of the age variance with a mean absolute deviation of 6.2 years. In the validation set, the mean absolute deviation was measured to be 8.0 years and 7.1 years for the single- and dual-locus model, respectively. Another age predictor model was also developed using a 40-blood sample training set that accounted for 71% of the age variance. This model gave a mean absolute deviation of 6.6 years for the training set and 10.3years for the validation set. The results indicate that specific CpGs in SCGN and KLF14 can be used as potential epigenetic markers to estimate age using saliva and blood specimens. These epigenetic markers could provide important information in cases where the determination of a suspect's age is critical in developing investigative leads. Copyright © 2017. Published by Elsevier B.V.
Random effects coefficient of determination for mixed and meta-analysis models
Demidenko, Eugene; Sargent, James; Onega, Tracy
2011-01-01
The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, Rr2, that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If Rr2 is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of Rr2 apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects—the model can be estimated using the dummy variable approach. We derive explicit formulas for Rr2 in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine. PMID:23750070
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Methods to estimate the between‐study variance and its uncertainty in meta‐analysis†
Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian PT; Langan, Dean; Salanti, Georgia
2015-01-01
Meta‐analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between‐study variability, which is typically modelled using a between‐study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between‐study variance, has been long challenged. Our aim is to identify known methods for estimation of the between‐study variance and its corresponding uncertainty, and to summarise the simulation and empirical evidence that compares them. We identified 16 estimators for the between‐study variance, seven methods to calculate confidence intervals, and several comparative studies. Simulation studies suggest that for both dichotomous and continuous data the estimator proposed by Paule and Mandel and for continuous data the restricted maximum likelihood estimator are better alternatives to estimate the between‐study variance. Based on the scenarios and results presented in the published studies, we recommend the Q‐profile method and the alternative approach based on a ‘generalised Cochran between‐study variance statistic’ to compute corresponding confidence intervals around the resulting estimates. Our recommendations are based on a qualitative evaluation of the existing literature and expert consensus. Evidence‐based recommendations require an extensive simulation study where all methods would be compared under the same scenarios. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. PMID:26332144
Log-normal frailty models fitted as Poisson generalized linear mixed models.
Hirsch, Katharina; Wienke, Andreas; Kuss, Oliver
2016-12-01
The equivalence of a survival model with a piecewise constant baseline hazard function and a Poisson regression model has been known since decades. As shown in recent studies, this equivalence carries over to clustered survival data: A frailty model with a log-normal frailty term can be interpreted and estimated as a generalized linear mixed model with a binary response, a Poisson likelihood, and a specific offset. Proceeding this way, statistical theory and software for generalized linear mixed models are readily available for fitting frailty models. This gain in flexibility comes at the small price of (1) having to fix the number of pieces for the baseline hazard in advance and (2) having to "explode" the data set by the number of pieces. In this paper we extend the simulations of former studies by using a more realistic baseline hazard (Gompertz) and by comparing the model under consideration with competing models. Furthermore, the SAS macro %PCFrailty is introduced to apply the Poisson generalized linear mixed approach to frailty models. The simulations show good results for the shared frailty model. Our new %PCFrailty macro provides proper estimates, especially in case of 4 events per piece. The suggested Poisson generalized linear mixed approach for log-normal frailty models based on the %PCFrailty macro provides several advantages in the analysis of clustered survival data with respect to more flexible modelling of fixed and random effects, exact (in the sense of non-approximate) maximum likelihood estimation, and standard errors and different types of confidence intervals for all variance parameters. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Blinded sample size re-estimation in three-arm trials with 'gold standard' design.
Mütze, Tobias; Friede, Tim
2017-10-15
In this article, we study blinded sample size re-estimation in the 'gold standard' design with internal pilot study for normally distributed outcomes. The 'gold standard' design is a three-arm clinical trial design that includes an active and a placebo control in addition to an experimental treatment. We focus on the absolute margin approach to hypothesis testing in three-arm trials at which the non-inferiority of the experimental treatment and the assay sensitivity are assessed by pairwise comparisons. We compare several blinded sample size re-estimation procedures in a simulation study assessing operating characteristics including power and type I error. We find that sample size re-estimation based on the popular one-sample variance estimator results in overpowered trials. Moreover, sample size re-estimation based on unbiased variance estimators such as the Xing-Ganju variance estimator results in underpowered trials, as it is expected because an overestimation of the variance and thus the sample size is in general required for the re-estimation procedure to eventually meet the target power. To overcome this problem, we propose an inflation factor for the sample size re-estimation with the Xing-Ganju variance estimator and show that this approach results in adequately powered trials. Because of favorable features of the Xing-Ganju variance estimator such as unbiasedness and a distribution independent of the group means, the inflation factor does not depend on the nuisance parameter and, therefore, can be calculated prior to a trial. Moreover, we prove that the sample size re-estimation based on the Xing-Ganju variance estimator does not bias the effect estimate. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen
2017-12-27
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
Planning additional drilling campaign using two-space genetic algorithm: A game theoretical approach
NASA Astrophysics Data System (ADS)
Kumral, Mustafa; Ozer, Umit
2013-03-01
Grade and tonnage are the most important technical uncertainties in mining ventures because of the use of estimations/simulations, which are mostly generated from drill data. Open pit mines are planned and designed on the basis of the blocks representing the entire orebody. Each block has different estimation/simulation variance reflecting uncertainty to some extent. The estimation/simulation realizations are submitted to mine production scheduling process. However, the use of a block model with varying estimation/simulation variances will lead to serious risk in the scheduling. In the medium of multiple simulations, the dispersion variances of blocks can be thought to regard technical uncertainties. However, the dispersion variance cannot handle uncertainty associated with varying estimation/simulation variances of blocks. This paper proposes an approach that generates the configuration of the best additional drilling campaign to generate more homogenous estimation/simulation variances of blocks. In other words, the objective is to find the best drilling configuration in such a way as to minimize grade uncertainty under budget constraint. Uncertainty measure of the optimization process in this paper is interpolation variance, which considers data locations and grades. The problem is expressed as a minmax problem, which focuses on finding the best worst-case performance i.e., minimizing interpolation variance of the block generating maximum interpolation variance. Since the optimization model requires computing the interpolation variances of blocks being simulated/estimated in each iteration, the problem cannot be solved by standard optimization tools. This motivates to use two-space genetic algorithm (GA) approach to solve the problem. The technique has two spaces: feasible drill hole configuration with minimization of interpolation variance and drill hole simulations with maximization of interpolation variance. Two-space interacts to find a minmax solution iteratively. A case study was conducted to demonstrate the performance of approach. The findings showed that the approach could be used to plan a new drilling campaign.
Measuring the Power Spectrum with Peculiar Velocities
NASA Astrophysics Data System (ADS)
Macaulay, Edward; Feldman, H. A.; Ferreira, P. G.; Jaffe, A. H.; Agarwal, S.; Hudson, M. J.; Watkins, R.
2012-01-01
The peculiar velocities of galaxies are an inherently valuable cosmological probe, providing an unbiased estimate of the distribution of matter on scales much larger than the depth of the survey. Much research interest has been motivated by the high dipole moment of our local peculiar velocity field, which suggests a large scale excess in the matter power spectrum, and can appear to be in some tension with the LCDM model. We use a composite catalogue of 4,537 peculiar velocity measurements with a characteristic depth of 33 h-1 Mpc to estimate the matter power spectrum. We compare the constraints with this method, directly studying the full peculiar velocity catalogue, to results from Macaulay et al. (2011), studying minimum variance moments of the velocity field, as calculated by Watkins, Feldman & Hudson (2009) and Feldman, Watkins & Hudson (2010). We find good agreement with the LCDM model on scales of k > 0.01 h Mpc-1. We find an excess of power on scales of k < 0.01 h Mpc-1, although with a 1 sigma uncertainty which includes the LCDM model. We find that the uncertainty in the excess at these scales is larger than an alternative result studying only moments of the velocity field, which is due to the minimum variance weights used to calculate the moments. At small scales, we are able to clearly discriminate between linear and nonlinear clustering in simulated peculiar velocity catalogues, and find some evidence (although less clear) for linear clustering in the real peculiar velocity data.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.
Dhar, Amrit; Minin, Vladimir N
2017-05-01
Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time
Dhar, Amrit
2017-01-01
Abstract Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences. PMID:28177780
The effect of dropout on the efficiency of D-optimal designs of linear mixed models.
Ortega-Azurduy, S A; Tan, F E S; Berger, M P F
2008-06-30
Dropout is often encountered in longitudinal data. Optimal designs will usually not remain optimal in the presence of dropout. In this paper, we study D-optimal designs for linear mixed models where dropout is encountered. Moreover, we estimate the efficiency loss in cases where a D-optimal design for complete data is chosen instead of that for data with dropout. Two types of monotonically decreasing response probability functions are investigated to describe dropout. Our results show that the location of D-optimal design points for the dropout case will shift with respect to that for the complete and uncorrelated data case. Owing to this shift, the information collected at the D-optimal design points for the complete data case does not correspond to the smallest variance. We show that the size of the displacement of the time points depends on the linear mixed model and that the efficiency loss is moderate.
Jastreboff, P W
1979-06-01
Time histograms of neural responses evoked by sinuosidal stimulation often contain a slow drifting and an irregular noise which disturb Fourier analysis of these responses. Section 2 of this paper evaluates the extent to which a linear drift influences the Fourier analysis, and develops a combined Fourier and linear regression analysis for detecting and correcting for such a linear drift. Usefulness of this correcting method is demonstrated for the time histograms of actual eye movements and Purkinje cell discharges evoked by sinusoidal rotation of rabbits in the horizontal plane. In Sect. 3, the analysis of variance is adopted for estimating the probability of the random occurrence of the response curve extracted by Fourier analysis from noise. This method proved to be useful for avoiding false judgements as to whether the response curve was meaningful, particularly when the response was small relative to the contaminating noise.
Methods to Estimate the Variance of Some Indices of the Signal Detection Theory: A Simulation Study
ERIC Educational Resources Information Center
Suero, Manuel; Privado, Jesús; Botella, Juan
2017-01-01
A simulation study is presented to evaluate and compare three methods to estimate the variance of the estimates of the parameters d and "C" of the signal detection theory (SDT). Several methods have been proposed to calculate the variance of their estimators, "d'" and "c." Those methods have been mostly assessed by…
Variance computations for functional of absolute risk estimates.
Pfeiffer, R M; Petracci, E
2011-07-01
We present a simple influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We apply this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we use an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria are compared to bootstrap variance estimates.
Variance computations for functional of absolute risk estimates
Pfeiffer, R.M.; Petracci, E.
2011-01-01
We present a simple influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We apply this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we use an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria are compared to bootstrap variance estimates. PMID:21643476
McGarvey, Richard; Burch, Paul; Matthews, Janet M
2016-01-01
Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic field sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard one-start aligned systematic survey design, a uniform 10 x 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fifth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to five times larger. Organisms being restricted to patches of habitat was alone sufficient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confidence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators (v) that corrected for inter-transect correlation (ν₈ and ν(W)) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two "post-stratification" variance estimators (ν₂ and ν₃) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, ν₈ and ν(W) again being the best performers in the longer-range autocorrelated populations. However, no systematic variance estimators tested were free from bias. On balance, systematic designs bring more narrow confidence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confidence interval. The search continues for better estimators of sampling variance for the systematic survey mean.
NASA Astrophysics Data System (ADS)
Uhlemann, C.; Feix, M.; Codis, S.; Pichon, C.; Bernardeau, F.; L'Huillier, B.; Kim, J.; Hong, S. E.; Laigle, C.; Park, C.; Shin, J.; Pogosyan, D.
2018-02-01
Starting from a very accurate model for density-in-cells statistics of dark matter based on large deviation theory, a bias model for the tracer density in spheres is formulated. It adopts a mean bias relation based on a quadratic bias model to relate the log-densities of dark matter to those of mass-weighted dark haloes in real and redshift space. The validity of the parametrized bias model is established using a parametrization-independent extraction of the bias function. This average bias model is then combined with the dark matter PDF, neglecting any scatter around it: it nevertheless yields an excellent model for densities-in-cells statistics of mass tracers that is parametrized in terms of the underlying dark matter variance and three bias parameters. The procedure is validated on measurements of both the one- and two-point statistics of subhalo densities in the state-of-the-art Horizon Run 4 simulation showing excellent agreement for measured dark matter variance and bias parameters. Finally, it is demonstrated that this formalism allows for a joint estimation of the non-linear dark matter variance and the bias parameters using solely the statistics of subhaloes. Having verified that galaxy counts in hydrodynamical simulations sampled on a scale of 10 Mpc h-1 closely resemble those of subhaloes, this work provides important steps towards making theoretical predictions for density-in-cells statistics applicable to upcoming galaxy surveys like Euclid or WFIRST.
Honest Importance Sampling with Multiple Markov Chains
Tan, Aixin; Doss, Hani; Hobert, James P.
2017-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.
Tan, Aixin; Doss, Hani; Hobert, James P
2015-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K.; Thomas, Venetta; Ambrosone, Christine B.; Bandera, Elisa V.; Berndt, Sonja I.; Bernstein, Leslie; Blot, William J.; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J.; Cheng, Iona; Chu, Lisa; Deming, Sandra L.; Driver, W. Ryan; Goodman, Phyllis; Hayes, Richard B.; Hennis, Anselm J. M.; Hsing, Ann W.; Hu, Jennifer J.; Ingles, Sue A.; John, Esther M.; Kittles, Rick A.; Kolb, Suzanne; Leske, M. Cristina; Monroe, Kristine R.; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A; Press, Michael F.; Rodriguez-Gil, Jorge L.; Rybicki, Ben A.; Schumacher, Fredrick; Stanford, Janet L.; Signorello, Lisa B.; Strom, Sara S.; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S.; Wu, Suh-Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G.; Stram, Alexander H.; Kolonel, Laurence N.; Marchand, Loïc Le; Henderson, Brian E.; Haiman, Christopher A.; Stram, Daniel O.
2015-01-01
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious. PMID:26125186
Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K; Thomas, Venetta; Ambrosone, Christine B; Bandera, Elisa V; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J; Cheng, Iona; Chu, Lisa; Deming, Sandra L; Driver, W Ryan; Goodman, Phyllis; Hayes, Richard B; Hennis, Anselm J M; Hsing, Ann W; Hu, Jennifer J; Ingles, Sue A; John, Esther M; Kittles, Rick A; Kolb, Suzanne; Leske, M Cristina; Millikan, Robert C; Monroe, Kristine R; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A; Press, Michael F; Rodriguez-Gil, Jorge L; Rybicki, Ben A; Schumacher, Fredrick; Stanford, Janet L; Signorello, Lisa B; Strom, Sara S; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S; Wu, Suh-Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G; Stram, Alexander H; Kolonel, Laurence N; Le Marchand, Loïc; Henderson, Brian E; Haiman, Christopher A; Stram, Daniel O
2015-01-01
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.
Lantry, B.F.; Rudstam, L. G.; Forney, J.L.; VanDeValk, A.J.; Mills, E.L.; Stewart, D.J.; Adams, J.V.
2008-01-01
Daily consumption was estimated from the stomach contents of walleyes Sander vitreus collected weekly from Oneida Lake, New York, during June-October 1975, 1992, 1993, and 1994 for one to four age-groups per year. Field rations were highly variable between weeks, and trends in ration size varied both seasonally and annually. The coefficient of variation for weekly field rations within years and ages ranged from 45% to 97%. Field estimates were compared with simulated consumption from a bioenergetics model. The simulation averages of daily ration deviated from those of the field estimates by -20.1% to +70.3%, with a mean across all simulations of +14.3%. The deviations for each time step were much greater than those for the simulation averages, ranging from -92.8% to +363.6%. A systematic trend in the deviations was observed, the model producing overpredictions at rations less than 3.7% of body weight. Analysis of variance indicated that the deviations were affected by sample year and week but not age. Multiple linear regression using backwards selection procedures and Akaike's information criterion indicated that walleye weight, walleye growth, lake temperature, prey energy density, and the proportion of gizzard shad Dorosoma cepedianum in the diet significantly affected the deviations between simulated and field rations and explained 32% of the variance. ?? Copyright by the American Fisheries Society 2008.
Stratum variance estimation for sample allocation in crop surveys. [Great Plains Corridor
NASA Technical Reports Server (NTRS)
Perry, C. R., Jr.; Chhikara, R. S. (Principal Investigator)
1980-01-01
The problem of determining stratum variances needed in achieving an optimum sample allocation for crop surveys by remote sensing is investigated by considering an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical crop statistics is developed for obtaining initial estimates of tratum variances. The procedure is applied to estimate stratum variances for wheat in the U.S. Great Plains and is evaluated based on the numerical results thus obtained. It is shown that the proposed technique is viable and performs satisfactorily, with the use of a conservative value for the field size and the crop statistics from the small political subdivision level, when the estimated stratum variances were compared to those obtained using the LANDSAT data.
Wientjes, Yvonne C J; Bijma, Piter; Vandenplas, Jérémie; Calus, Mario P L
2017-10-01
Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations. Copyright © 2017 by the Genetics Society of America.
ERIC Educational Resources Information Center
Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S.
2016-01-01
The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…
Ex Post Facto Monte Carlo Variance Reduction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Booth, Thomas E.
The variance in Monte Carlo particle transport calculations is often dominated by a few particles whose importance increases manyfold on a single transport step. This paper describes a novel variance reduction method that uses a large importance change as a trigger to resample the offending transport step. That is, the method is employed only after (ex post facto) a random walk attempts a transport step that would otherwise introduce a large variance in the calculation.Improvements in two Monte Carlo transport calculations are demonstrated empirically using an ex post facto method. First, the method is shown to reduce the variance inmore » a penetration problem with a cross-section window. Second, the method empirically appears to modify a point detector estimator from an infinite variance estimator to a finite variance estimator.« less
Baeza-Baeza, J J; Pous-Torres, S; Torres-Lapasió, J R; García-Alvarez-Coque, M C
2010-04-02
Peak broadening and skewness are fundamental parameters in chromatography, since they affect the resolution capability of a chromatographic column. A common practice to characterise chromatographic columns is to estimate the efficiency and asymmetry factor for the peaks of one or more solutes eluted at selected experimental conditions. This has the drawback that the extra-column contributions to the peak variance and skewness make the peak shape parameters depend on the retention time. We propose and discuss here the use of several approaches that allow the estimation of global parameters (non-dependent on the retention time) to describe the column performance. The global parameters arise from different linear relationships that can be established between the peak variance, standard deviation, or half-widths with the retention time. Some of them describe exclusively the column contribution to the peak broadening, whereas others consider the extra-column effects also. The estimation of peak skewness was also possible for the approaches based on the half-widths. The proposed approaches were applied to the characterisation of different columns (Spherisorb, Zorbax SB, Zorbax Eclipse, Kromasil, Chromolith, X-Terra and Inertsil), using the chromatographic data obtained for several diuretics and basic drugs (beta-blockers). Copyright (c) 2010 Elsevier B.V. All rights reserved.
Short communication: Effect of heat stress on nonreturn rate of Italian Holstein cows.
Biffani, S; Bernabucci, U; Vitali, A; Lacetera, N; Nardone, A
2016-07-01
The data set consisted of 1,016,856 inseminations of 191,012 first, second, and third parity Holstein cows from 484 farms. Data were collected from year 2001 through 2007 and included meteorological data from 35 weather stations. Nonreturn rate at 56 d after first insemination (NR56) was considered. A logit model was used to estimate the effect of temperature-humidity index (THI) on reproduction across parities. Then, least squares means were used to detect the THI breakpoints using a 2-phase linear regression procedure. Finally, a multiple-trait threshold model was used to estimate variance components for NR56 in first and second parity cows. A dummy regression variable (t) was used to estimate NR56 decline due to heat stress. The NR56, both for first and second parity cows, was significantly (unfavorable) affected by THI from 4 d before 5 d after the insemination date. Additive genetic variances for NR56 increased from first to second parity both for general and heat stress effect. Genetic correlations between general and heat stress effects were -0.31 for first parity and -0.45 for second parity cows. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Accounting for Sampling Error in Genetic Eigenvalues Using Random Matrix Theory.
Sztepanacz, Jacqueline L; Blows, Mark W
2017-07-01
The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. Copyright © 2017 by the Genetics Society of America.
Survival estimation and the effects of dependency among animals
Schmutz, Joel A.; Ward, David H.; Sedinger, James S.; Rexstad, Eric A.
1995-01-01
Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant, we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities.
Casero-Alonso, V; López-Fidalgo, J; Torsney, B
2017-01-01
Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Jaeschke, Lina; Steinbrecher, Astrid; Jeran, Stephanie; Konigorski, Stefan; Pischon, Tobias
2018-04-20
24 h-accelerometry is now used to objectively assess physical activity (PA) in many observational studies like the German National Cohort; however, PA variability, observational time needed to estimate habitual PA, and reliability are unclear. We assessed 24 h-PA of 50 participants using triaxial accelerometers (ActiGraph GT3X+) over 2 weeks. Variability of overall PA and different PA intensities (time in inactivity and in low intensity, moderate, vigorous, and very vigorous PA) between days of assessment or days of the week was quantified using linear mixed-effects and random effects models. We calculated the required number of days to estimate PA, and calculated PA reliability using intraclass correlation coefficients. Between- and within-person variance accounted for 34.4-45.5% and 54.5-65.6%, respectively, of total variance in overall PA and PA intensities over the 2 weeks. Overall PA and times in low intensity, moderate, and vigorous PA decreased slightly over the first 3 days of assessment. Overall PA (p = 0.03), time in inactivity (p = 0.003), in low intensity PA (p = 0.001), in moderate PA (p = 0.02), and in vigorous PA (p = 0.04) slightly differed between days of the week, being highest on Wednesday and Friday and lowest on Sunday and Monday, with apparent differences between Saturday and Sunday. In nested random models, the day of the week accounted for < 19% of total variance in the PA parameters. On average, the required number of days to estimate habitual PA was around 1 week, being 7 for overall PA and ranging from 6 to 9 for the PA intensities. Week-to-week reliability was good (intraclass correlation coefficients, range, 0.68-0.82). Individual PA, as assessed using 24 h-accelerometry, is highly variable between days, but the day of assessment or the day of the week explain only small parts of this variance. Our data indicate that 1 week of assessment is necessary for reliable estimation of habitual PA.
Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver
2015-01-01
The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
Variance and covariance estimates for weaning weight of Senepol cattle.
Wright, D W; Johnson, Z B; Brown, C J; Wildeus, S
1991-10-01
Variance and covariance components were estimated for weaning weight from Senepol field data for use in the reduced animal model for a maternally influenced trait. The 4,634 weaning records were used to evaluate 113 sires and 1,406 dams on the island of St. Croix. Estimates of direct additive genetic variance (sigma 2A), maternal additive genetic variance (sigma 2M), covariance between direct and maternal additive genetic effects (sigma AM), permanent maternal environmental variance (sigma 2PE), and residual variance (sigma 2 epsilon) were calculated by equating variances estimated from a sire-dam model and a sire-maternal grandsire model, with and without the inverse of the numerator relationship matrix (A-1), to their expectations. Estimates were sigma 2A, 139.05 and 138.14 kg2; sigma 2M, 307.04 and 288.90 kg2; sigma AM, -117.57 and -103.76 kg2; sigma 2PE, -258.35 and -243.40 kg2; and sigma 2 epsilon, 588.18 and 577.72 kg2 with and without A-1, respectively. Heritability estimates for direct additive (h2A) were .211 and .210 with and without A-1, respectively. Heritability estimates for maternal additive (h2M) were .47 and .44 with and without A-1, respectively. Correlations between direct and maternal (IAM) effects were -.57 and -.52 with and without A-1, respectively.
Cross-bispectrum computation and variance estimation
NASA Technical Reports Server (NTRS)
Lii, K. S.; Helland, K. N.
1981-01-01
A method for the estimation of cross-bispectra of discrete real time series is developed. The asymptotic variance properties of the bispectrum are reviewed, and a method for the direct estimation of bispectral variance is given. The symmetry properties are described which minimize the computations necessary to obtain a complete estimate of the cross-bispectrum in the right-half-plane. A procedure is given for computing the cross-bispectrum by subdividing the domain into rectangular averaging regions which help reduce the variance of the estimates and allow easy application of the symmetry relationships to minimize the computational effort. As an example of the procedure, the cross-bispectrum of a numerically generated, exponentially distributed time series is computed and compared with theory.
Moran, John L; Solomon, Patricia J
2012-05-16
For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, a number of statistical estimators have been proposed as alternatives to the traditional ordinary least squares (OLS) regression with log dependent variable. Using a cohort of patients identified in the Australian and New Zealand Intensive Care Society Adult Patient Database, 2008-2009, 12 different methods were used for estimation of intensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis of firstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal and skew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models [GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial and inverse-Gaussian], extended estimating equations [EEE] and a finite mixture model including a gamma distribution. A fixed covariate list and ICU-site clustering with robust variance were utilised for model fitting with split-sample determination (80%) and validation (20%) data sets, and model simulation was undertaken to establish over-fitting (Copas test). Indices of model specification using Bayesian information criterion [BIC: lower values preferred] and residual analysis as well as predictive performance (R2, concordance correlation coefficient (CCC), mean absolute error [MAE]) were established for each estimator. The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8) years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%. ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated marked kurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from a maximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22 (LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superior residual behaviour was established for the log-scale estimators. There was a general tendency for over-prediction (negative residuals) and for over-fitting, the exception being the GLM negative binomial estimator. The mean-variance function was best approximated by a quadratic function, consistent with log-scale estimation; the link function was estimated (EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function. For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the most consistently performing estimator(s). Neither the GLM variants nor the skew-regression estimators dominated.
Zink, V; Štípková, M; Lassen, J
2011-10-01
The aim of this study was to estimate genetic parameters for fertility traits and linear type traits in the Czech Holstein dairy cattle population. Phenotypic data regarding 12 linear type traits, measured in first lactation, and 3 fertility traits, measured in each of first and second lactation, were collected from 2005 to 2009 in the progeny testing program of the Czech-Moravian Breeders Corporation. The number of animals for each linear type trait was 59,467, except for locomotion, where 53,436 animals were recorded. The 3-generation pedigree file included 164,125 animals. (Co)variance components were estimated using AI-REML in a series of bivariate analyses, which were implemented via the DMU package. Fertility traits included days from calving to first service (CF1), days open (DO1), and days from first to last service (FL1) in first lactation, and days from calving to first service (CF2), days open (DO2), and days from first to last service (FL2) in second lactation. The number of animals with fertility data varied between traits and ranged from 18,915 to 58,686. All heritability estimates for reproduction traits were low, ranging from 0.02 to 0.04. Heritability estimates for linear type traits ranged from 0.03 for locomotion to 0.39 for stature. Estimated genetic correlations between fertility traits and linear type traits were generally neutral or positive, whereas genetic correlations between body condition score and CF1, DO1, FL1, CF2 and DO2 were mostly negative, with the greatest correlation between BCS and CF2 (-0.51). Genetic correlations with locomotion were greatest for CF1 and CF2 (-0.34 for both). Results of this study show that cows that are genetically extreme for angularity, stature, and body depth tend to perform poorly for fertility traits. At the same time, cows that are genetically predisposed for low body condition score or high locomotion score are generally inferior in fertility. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Discriminative Learning of Receptive Fields from Responses to Non-Gaussian Stimulus Ensembles
Meyer, Arne F.; Diepenbrock, Jan-Philipp; Happel, Max F. K.; Ohl, Frank W.; Anemüller, Jörn
2014-01-01
Analysis of sensory neurons' processing characteristics requires simultaneous measurement of presented stimuli and concurrent spike responses. The functional transformation from high-dimensional stimulus space to the binary space of spike and non-spike responses is commonly described with linear-nonlinear models, whose linear filter component describes the neuron's receptive field. From a machine learning perspective, this corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method proposed here adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron's receptive field filter. Computational learning theory provides a theoretical framework for learning from data and guarantees optimality in the sense that the risk of erroneously assigning a spike-eliciting stimulus example to the non-spike class (and vice versa) is minimized. Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field (STRF) estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency-modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where second-order methods based on the spike-triggered average (STA) do not. Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Thus, CbRF estimation may prove useful for investigation of neuronal processes in response to natural stimuli and in settings where rapid adaptation is induced by experimental design. PMID:24699631
Discriminative learning of receptive fields from responses to non-Gaussian stimulus ensembles.
Meyer, Arne F; Diepenbrock, Jan-Philipp; Happel, Max F K; Ohl, Frank W; Anemüller, Jörn
2014-01-01
Analysis of sensory neurons' processing characteristics requires simultaneous measurement of presented stimuli and concurrent spike responses. The functional transformation from high-dimensional stimulus space to the binary space of spike and non-spike responses is commonly described with linear-nonlinear models, whose linear filter component describes the neuron's receptive field. From a machine learning perspective, this corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method proposed here adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron's receptive field filter. Computational learning theory provides a theoretical framework for learning from data and guarantees optimality in the sense that the risk of erroneously assigning a spike-eliciting stimulus example to the non-spike class (and vice versa) is minimized. Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field (STRF) estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency-modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where second-order methods based on the spike-triggered average (STA) do not. Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Thus, CbRF estimation may prove useful for investigation of neuronal processes in response to natural stimuli and in settings where rapid adaptation is induced by experimental design.
Methods to Estimate the Between-Study Variance and Its Uncertainty in Meta-Analysis
ERIC Educational Resources Information Center
Veroniki, Areti Angeliki; Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian P. T.; Langan, Dean; Salanti, Georgia
2016-01-01
Meta-analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between-study variability, which is typically modelled using a between-study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between-study variance,…
An Analysis of Variance Approach for the Estimation of Response Time Distributions in Tests
ERIC Educational Resources Information Center
Attali, Yigal
2010-01-01
Generalizability theory and analysis of variance methods are employed, together with the concept of objective time pressure, to estimate response time distributions and the degree of time pressure in timed tests. By estimating response time variance components due to person, item, and their interaction, and fixed effects due to item types and…
Tsou, Tsung-Shan
2007-03-30
This paper introduces an exploratory way to determine how variance relates to the mean in generalized linear models. This novel method employs the robust likelihood technique introduced by Royall and Tsou.A urinary data set collected by Ginsberg et al. and the fabric data set analysed by Lee and Nelder are considered to demonstrate the applicability and simplicity of the proposed technique. Application of the proposed method could easily reveal a mean-variance relationship that would generally be left unnoticed, or that would require more complex modelling to detect. Copyright (c) 2006 John Wiley & Sons, Ltd.
One-shot estimate of MRMC variance: AUC.
Gallas, Brandon D
2006-03-01
One popular study design for estimating the area under the receiver operating characteristic curve (AUC) is the one in which a set of readers reads a set of cases: a fully crossed design in which every reader reads every case. The variability of the subsequent reader-averaged AUC has two sources: the multiple readers and the multiple cases (MRMC). In this article, we present a nonparametric estimate for the variance of the reader-averaged AUC that is unbiased and does not use resampling tools. The one-shot estimate is based on the MRMC variance derived by the mechanistic approach of Barrett et al. (2005), as well as the nonparametric variance of a single-reader AUC derived in the literature on U statistics. We investigate the bias and variance properties of the one-shot estimate through a set of Monte Carlo simulations with simulated model observers and images. The different simulation configurations vary numbers of readers and cases, amounts of image noise and internal noise, as well as how the readers are constructed. We compare the one-shot estimate to a method that uses the jackknife resampling technique with an analysis of variance model at its foundation (Dorfman et al. 1992). The name one-shot highlights that resampling is not used. The one-shot and jackknife estimators behave similarly, with the one-shot being marginally more efficient when the number of cases is small. We have derived a one-shot estimate of the MRMC variance of AUC that is based on a probabilistic foundation with limited assumptions, is unbiased, and compares favorably to an established estimate.
Future mission studies: Preliminary comparisons of solar flux models
NASA Technical Reports Server (NTRS)
Ashrafi, S.
1991-01-01
The results of comparisons of the solar flux models are presented. (The wavelength lambda = 10.7 cm radio flux is the best indicator of the strength of the ionizing radiations such as solar ultraviolet and x-ray emissions that directly affect the atmospheric density thereby changing the orbit lifetime of satellites. Thus, accurate forecasting of solar flux F sub 10.7 is crucial for orbit determination of spacecrafts.) The measured solar flux recorded by National Oceanic and Atmospheric Administration (NOAA) is compared against the forecasts made by Schatten, MSFC, and NOAA itself. The possibility of a combined linear, unbiased minimum-variance estimation that properly combines all three models into one that minimizes the variance is also discussed. All the physics inherent in each model are combined. This is considered to be the dead-end statistical approach to solar flux forecasting before any nonlinear chaotic approach.
Visscher, Peter M; Goddard, Michael E
2015-01-01
Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N(2), where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N. Copyright © 2015 by the Genetics Society of America.
Precision of proportion estimation with binary compressed Raman spectrum.
Réfrégier, Philippe; Scotté, Camille; de Aguiar, Hilton B; Rigneault, Hervé; Galland, Frédéric
2018-01-01
The precision of proportion estimation with binary filtering of a Raman spectrum mixture is analyzed when the number of binary filters is equal to the number of present species and when the measurements are corrupted with Poisson photon noise. It is shown that the Cramer-Rao bound provides a useful methodology to analyze the performance of such an approach, in particular when the binary filters are orthogonal. It is demonstrated that a simple linear mean square error estimation method is efficient (i.e., has a variance equal to the Cramer-Rao bound). Evolutions of the Cramer-Rao bound are analyzed when the measuring times are optimized or when the considered proportion for binary filter synthesis is not optimized. Two strategies for the appropriate choice of this considered proportion are also analyzed for the binary filter synthesis.
Unbalanced and Minimal Point Equivalent Estimation Second-Order Split-Plot Designs
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Kowalski, Scott M.; Vining, G. Geoffrey
2007-01-01
Restricting the randomization of hard-to-change factors in industrial experiments is often performed by employing a split-plot design structure. From an economic perspective, these designs minimize the experimental cost by reducing the number of resets of the hard-to- change factors. In this paper, unbalanced designs are considered for cases where the subplots are relatively expensive and the experimental apparatus accommodates an unequal number of runs per whole-plot. We provide construction methods for unbalanced second-order split- plot designs that possess the equivalence estimation optimality property, providing best linear unbiased estimates of the parameters; independent of the variance components. Unbalanced versions of the central composite and Box-Behnken designs are developed. For cases where the subplot cost approaches the whole-plot cost, minimal point designs are proposed and illustrated with a split-plot Notz design.
Determination of the optimal level for combining area and yield estimates
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Hixson, M. M.; Jobusch, C. D.
1981-01-01
Several levels of obtaining both area and yield estimates of corn and soybeans in Iowa were considered: county, refined strata, refined/split strata, crop reporting district, and state. Using the CCEA model form and smoothed weather data, regression coefficients at each level were derived to compute yield and its variance. Variances were also computed with stratum level. The variance of the yield estimates was largest at the state and smallest at the county level for both crops. The refined strata had somewhat larger variances than those associated with the refined/split strata and CRD. For production estimates, the difference in standard deviations among levels was not large for corn, but for soybeans the standard deviation at the state level was more than 50% greater than for the other levels. The refined strata had the smallest standard deviations. The county level was not considered in evaluation of production estimates due to lack of county area variances.
Dorleijn, Desirée M J; Luijsterburg, Pim A J; Burdorf, Alex; Rozendaal, Rianne M; Verhaar, Jan A N; Bos, Pieter K; Bierma-Zeinstra, Sita M A
2014-04-01
The goal of this study was to assess whether there is an association between ambient weather conditions and patients' clinical symptoms in patients with hip osteoarthritis (OA). The design was a cohort study with a 2-year follow-up and 3-monthly measurements and prospectively collected data on weather variables. The study population consisted of 222 primary care patients with hip OA. Weather variables included temperature, wind speed, total amount of sun hours, precipitation, barometric pressure, and relative humidity. The primary outcomes were severity of hip pain and hip disability as measured with the Western Ontario and McMasters University Osteoarthritis Index (WOMAC) pain and function subscales. Associations between hip pain and hip disability and the weather variables were assessed using crude and multivariate adjusted linear mixed-model analysis for repeated measurements. On the day of questionnaire completion, mean relative humidity was associated with WOMAC pain (estimate 0.1; 95% confidence interval=0.0-0.2; P=.02). Relative humidity contributed < or = 1% to the explained within-patient variance and between-patient variance of the WOMAC pain score. Mean barometric pressure was associated with WOMAC function (estimate 0.1; 95% confidence interval=0.0-0.1; P=.02). Barometric pressure contributed < or = 1% to the explained within-patient variance and between-patient variance of the WOMAC function score. The other weather variables were not associated with the WOMAC pain or function score. Our results support the general opinion of OA patients that barometric pressure and relative humidity influence perceived OA symptoms. However, the contribution of these weather variables (< or = 1%) to the severity of OA symptoms is not considered to be clinically relevant. Copyright © 2014 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Random effects coefficient of determination for mixed and meta-analysis models.
Demidenko, Eugene; Sargent, James; Onega, Tracy
2012-01-01
The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, [Formula: see text], that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If [Formula: see text] is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of [Formula: see text] apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects-the model can be estimated using the dummy variable approach. We derive explicit formulas for [Formula: see text] in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine.
Olivoto, T; Nardino, M; Carvalho, I R; Follmann, D N; Ferrari, M; Szareski, V J; de Pelegrin, A J; de Souza, V Q
2017-03-22
Methodologies using restricted maximum likelihood/best linear unbiased prediction (REML/BLUP) in combination with sequential path analysis in maize are still limited in the literature. Therefore, the aims of this study were: i) to use REML/BLUP-based procedures in order to estimate variance components, genetic parameters, and genotypic values of simple maize hybrids, and ii) to fit stepwise regressions considering genotypic values to form a path diagram with multi-order predictors and minimum multicollinearity that explains the relationships of cause and effect among grain yield-related traits. Fifteen commercial simple maize hybrids were evaluated in multi-environment trials in a randomized complete block design with four replications. The environmental variance (78.80%) and genotype-vs-environment variance (20.83%) accounted for more than 99% of the phenotypic variance of grain yield, which difficult the direct selection of breeders for this trait. The sequential path analysis model allowed the selection of traits with high explanatory power and minimum multicollinearity, resulting in models with elevated fit (R 2 > 0.9 and ε < 0.3). The number of kernels per ear (NKE) and thousand-kernel weight (TKW) are the traits with the largest direct effects on grain yield (r = 0.66 and 0.73, respectively). The high accuracy of selection (0.86 and 0.89) associated with the high heritability of the average (0.732 and 0.794) for NKE and TKW, respectively, indicated good reliability and prospects of success in the indirect selection of hybrids with high-yield potential through these traits. The negative direct effect of NKE on TKW (r = -0.856), however, must be considered. The joint use of mixed models and sequential path analysis is effective in the evaluation of maize-breeding trials.
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.
Building pit dewatering: application of transient analytic elements.
Zaadnoordijk, Willem J
2006-01-01
Analytic elements are well suited for the design of building pit dewatering. Wells and drains can be modeled accurately by analytic elements, both nearby to determine the pumping level and at some distance to verify the targeted drawdown at the building site and to estimate the consequences in the vicinity. The ability to shift locations of wells or drains easily makes the design process very flexible. The temporary pumping has transient effects, for which transient analytic elements may be used. This is illustrated using the free, open-source, object-oriented analytic element simulator Tim(SL) for the design of a building pit dewatering near a canal. Steady calculations are complemented with transient calculations. Finally, the bandwidths of the results are estimated using linear variance analysis.
Radial orbit error reduction and sea surface topography determination using satellite altimetry
NASA Technical Reports Server (NTRS)
Engelis, Theodossios
1987-01-01
A method is presented in satellite altimetry that attempts to simultaneously determine the geoid and sea surface topography with minimum wavelengths of about 500 km and to reduce the radial orbit error caused by geopotential errors. The modeling of the radial orbit error is made using the linearized Lagrangian perturbation theory. Secular and second order effects are also included. After a rather extensive validation of the linearized equations, alternative expressions of the radial orbit error are derived. Numerical estimates for the radial orbit error and geoid undulation error are computed using the differences of two geopotential models as potential coefficient errors, for a SEASAT orbit. To provide statistical estimates of the radial distances and the geoid, a covariance propagation is made based on the full geopotential covariance. Accuracy estimates for the SEASAT orbits are given which agree quite well with already published results. Observation equations are develped using sea surface heights and crossover discrepancies as observables. A minimum variance solution with prior information provides estimates of parameters representing the sea surface topography and corrections to the gravity field that is used for the orbit generation. The simulation results show that the method can be used to effectively reduce the radial orbit error and recover the sea surface topography.
Bertrand-Krajewski, J L
2004-01-01
In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.
Rose, Kevin C.; Winslow, Luke A.; Read, Jordan S.; Read, Emily K.; Solomon, Christopher T.; Adrian, Rita; Hanson, Paul C.
2014-01-01
Diel changes in dissolved oxygen are often used to estimate gross primary production (GPP) and ecosystem respiration (ER) in aquatic ecosystems. Despite the widespread use of this approach to understand ecosystem metabolism, we are only beginning to understand the degree and underlying causes of uncertainty for metabolism model parameter estimates. Here, we present a novel approach to improve the precision and accuracy of ecosystem metabolism estimates by identifying physical metrics that indicate when metabolism estimates are highly uncertain. Using datasets from seventeen instrumented GLEON (Global Lake Ecological Observatory Network) lakes, we discovered that many physical characteristics correlated with uncertainty, including PAR (photosynthetically active radiation, 400-700 nm), daily variance in Schmidt stability, and wind speed. Low PAR was a consistent predictor of high variance in GPP model parameters, but also corresponded with low ER model parameter variance. We identified a threshold (30% of clear sky PAR) below which GPP parameter variance increased rapidly and was significantly greater in nearly all lakes compared with variance on days with PAR levels above this threshold. The relationship between daily variance in Schmidt stability and GPP model parameter variance depended on trophic status, whereas daily variance in Schmidt stability was consistently positively related to ER model parameter variance. Wind speeds in the range of ~0.8-3 m s–1 were consistent predictors of high variance for both GPP and ER model parameters, with greater uncertainty in eutrophic lakes. Our findings can be used to reduce ecosystem metabolism model parameter uncertainty and identify potential sources of that uncertainty.
ERIC Educational Resources Information Center
Tanner-Smith, Emily E.; Tipton, Elizabeth
2014-01-01
Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and SPSS (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding…
Correcting for Systematic Bias in Sample Estimates of Population Variances: Why Do We Divide by n-1?
ERIC Educational Resources Information Center
Mittag, Kathleen Cage
An important topic presented in introductory statistics courses is the estimation of population parameters using samples. Students learn that when estimating population variances using sample data, we always get an underestimate of the population variance if we divide by n rather than n-1. One implication of this correction is that the degree of…
Vanderick, S; Harris, B L; Pryce, J E; Gengler, N
2009-03-01
In New Zealand, a large proportion of cows are currently crossbreds, mostly Holstein-Friesians (HF) x Jersey (JE). The genetic evaluation system for milk yields is considering the same additive genetic effects for all breeds. The objective was to model different additive effects according to parental breeds to obtain first estimates of correlations among breed-specific effects and to study the usefulness of this type of random regression test-day model. Estimates of (co)variance components for purebred HF and JE cattle in purebred herds were computed by using a single-breed model. This analysis showed differences between the 2 breeds, with a greater variability in the HF breed. (Co)variance components for purebred HF and JE and crossbred HF x JE cattle were then estimated by using a complete multibreed model in which computations of complete across-breed (co)variances were simplified by correlating only eigenvectors for HF and JE random regressions of the same order as obtained from the single-breed analysis. Parameter estimates differed more strongly than expected between the single-breed and multibreed analyses, especially for JE. This could be due to differences between animals and management in purebred and non-purebred herds. In addition, the model used only partially accounted for heterosis. The multibreed analysis showed additive genetic differences between the HF and JE breeds, expressed as genetic correlations of additive effects in both breeds, especially in linear and quadratic Legendre polynomials (respectively, 0.807 and 0.604). The differences were small for overall milk production (0.926). Results showed that permanent environmental lactation curves were highly correlated across breeds; however, intraherd lactation curves were also affected by the breed-environment interaction. This result may indicate the existence of breed-specific competition effects that vary through the different lactation stages. In conclusion, a multibreed model similar to the one presented could optimally use the environmental and genetic parameters and provide breed-dependent additive breeding values. This model could also be a useful tool to evaluate crossbred dairy cattle populations like those in New Zealand. However, a routine evaluation would still require the development of an improved methodology. It would also be computationally very challenging because of the simultaneous presence of a large number of breeds.
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181
Online Estimation of Allan Variance Coefficients Based on a Neural-Extended Kalman Filter
Miao, Zhiyong; Shen, Feng; Xu, Dingjie; He, Kunpeng; Tian, Chunmiao
2015-01-01
As a noise analysis method for inertial sensors, the traditional Allan variance method requires the storage of a large amount of data and manual analysis for an Allan variance graph. Although the existing online estimation methods avoid the storage of data and the painful procedure of drawing slope lines for estimation, they require complex transformations and even cause errors during the modeling of dynamic Allan variance. To solve these problems, first, a new state-space model that directly models the stochastic errors to obtain a nonlinear state-space model was established for inertial sensors. Then, a neural-extended Kalman filter algorithm was used to estimate the Allan variance coefficients. The real noises of an ADIS16405 IMU and fiber optic gyro-sensors were analyzed by the proposed method and traditional methods. The experimental results show that the proposed method is more suitable to estimate the Allan variance coefficients than the traditional methods. Moreover, the proposed method effectively avoids the storage of data and can be easily implemented using an online processor. PMID:25625903
Optimal Tuner Selection for Kalman Filter-Based Aircraft Engine Performance Estimation
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Garg, Sanjay
2010-01-01
A linear point design methodology for minimizing the error in on-line Kalman filter-based aircraft engine performance estimation applications is presented. This technique specifically addresses the underdetermined estimation problem, where there are more unknown parameters than available sensor measurements. A systematic approach is applied to produce a model tuning parameter vector of appropriate dimension to enable estimation by a Kalman filter, while minimizing the estimation error in the parameters of interest. Tuning parameter selection is performed using a multi-variable iterative search routine which seeks to minimize the theoretical mean-squared estimation error. This paper derives theoretical Kalman filter estimation error bias and variance values at steady-state operating conditions, and presents the tuner selection routine applied to minimize these values. Results from the application of the technique to an aircraft engine simulation are presented and compared to the conventional approach of tuner selection. Experimental simulation results are found to be in agreement with theoretical predictions. The new methodology is shown to yield a significant improvement in on-line engine performance estimation accuracy
Vargas-Meléndez, Leandro; Boada, Beatriz L; Boada, María Jesús L; Gauchía, Antonio; Díaz, Vicente
2016-08-31
This article presents a novel estimator based on sensor fusion, which combines the Neural Network (NN) with a Kalman filter in order to estimate the vehicle roll angle. The NN estimates a "pseudo-roll angle" through variables that are easily measured from Inertial Measurement Unit (IMU) sensors. An IMU is a device that is commonly used for vehicle motion detection, and its cost has decreased during recent years. The pseudo-roll angle is introduced in the Kalman filter in order to filter noise and minimize the variance of the norm and maximum errors' estimation. The NN has been trained for J-turn maneuvers, double lane change maneuvers and lane change maneuvers at different speeds and road friction coefficients. The proposed method takes into account the vehicle non-linearities, thus yielding good roll angle estimation. Finally, the proposed estimator has been compared with one that uses the suspension deflections to obtain the pseudo-roll angle. Experimental results show the effectiveness of the proposed NN and Kalman filter-based estimator.
Vargas-Meléndez, Leandro; Boada, Beatriz L.; Boada, María Jesús L.; Gauchía, Antonio; Díaz, Vicente
2016-01-01
This article presents a novel estimator based on sensor fusion, which combines the Neural Network (NN) with a Kalman filter in order to estimate the vehicle roll angle. The NN estimates a “pseudo-roll angle” through variables that are easily measured from Inertial Measurement Unit (IMU) sensors. An IMU is a device that is commonly used for vehicle motion detection, and its cost has decreased during recent years. The pseudo-roll angle is introduced in the Kalman filter in order to filter noise and minimize the variance of the norm and maximum errors’ estimation. The NN has been trained for J-turn maneuvers, double lane change maneuvers and lane change maneuvers at different speeds and road friction coefficients. The proposed method takes into account the vehicle non-linearities, thus yielding good roll angle estimation. Finally, the proposed estimator has been compared with one that uses the suspension deflections to obtain the pseudo-roll angle. Experimental results show the effectiveness of the proposed NN and Kalman filter-based estimator. PMID:27589763
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
Dazard, Jean-Eudes; Rao, J Sunil
2012-07-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data
Dazard, Jean-Eudes; Rao, J. Sunil
2012-01-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950
Ruder, Avima M.; Succop, Paul; Waters, Martha A.
2015-01-01
Although polychlorinated biphenyls (PCBs) have been banned in many countries for more than three decades, exposures to PCBs continue to be of concern due to their long half-lives and carcinogenic effects. In National Institute for Occupational Safety and Health studies, we are using semiquantitative plant-specific job exposure matrices (JEMs) to estimate historical PCB exposures for workers (n=24,865) exposed to PCBs from 1938 to 1978 at three capacitor manufacturing plants. A subcohort of these workers (n=410) employed in two of these plants had serum PCB concentrations measured at up to four times between 1976 and 1989. Our objectives were to evaluate the strength of association between an individual worker’s measured serum PCB levels and the same worker’s cumulative exposure estimated through 1977 with the (1) JEM and (2) duration of employment, and to calculate the explained variance the JEM provides for serum PCB levels using (3) simple linear regression. Consistent strong and statistically significant associations were observed between the cumulative exposures estimated with the JEM and serum PCB concentrations for all years. The strength of association between duration of employment and serum PCBs was good for highly chlorinated (Aroclor 1254/HPCB) but not less chlorinated (Aroclor 1242/LPCB) PCBs. In the simple regression models, cumulative occupational exposure estimated using the JEMs explained 14–24 % of the variance of the Aroclor 1242/LPCB and 22–39 % for Aroclor 1254/HPCB serum concentrations. We regard the cumulative exposure estimated with the JEM as a better estimate of PCB body burdens than serum concentrations quantified as Aroclor 1242/LPCB and Aroclor 1254/HPCB. PMID:23475397
Experimental cosmic statistics - I. Variance
NASA Astrophysics Data System (ADS)
Colombi, Stéphane; Szapudi, István; Jenkins, Adrian; Colberg, Jörg
2000-04-01
Counts-in-cells are measured in the τCDM Virgo Hubble Volume simulation. This large N-body experiment has 109 particles in a cubic box of size 2000h-1Mpc. The unprecedented combination of size and resolution allows, for the first time, a realistic numerical analysis of the cosmic errors and cosmic correlations of statistics related to counts-in-cells measurements, such as the probability distribution function PN itself, its factorial moments Fk and the related cumulants ψ and SNs. These statistics are extracted from the whole simulation cube, as well as from 4096 subcubes of size 125h-1Mpc, each representing a virtual random realization of the local universe. The measurements and their scatter over the subvolumes are compared to the theoretical predictions of Colombi, Bouchet & Schaeffer for P0, and of Szapudi & Colombi and Szapudi, Colombi & Bernardeau for the factorial moments and the cumulants. The general behaviour of experimental variance and cross-correlations as functions of scale and order is well described by theoretical predictions, with a few per cent accuracy in the weakly non-linear regime for the cosmic error on factorial moments. On highly non-linear scales, however, all variants of the hierarchical model used by SC and SCB to describe clustering appear to become increasingly approximate, which leads to a slight overestimation of the error, by about a factor of two in the worst case. Because of the needed supplementary perturbative approach, the theory is less accurate for non-linear estimators, such as cumulants, than for factorial moments. The cosmic bias is evaluated as well, and, in agreement with SCB, is found to be insignificant compared with the cosmic variance in all regimes investigated. While higher order statistics were previously evaluated in several simulations, this work presents textbook quality measurements of SNs, 3<=N<=10, in an unprecedented dynamic range of 0.05 <~ ψ <~ 50. In the weakly non-linear regime the results confirm previous findings and agree remarkably well with perturbation theory predictions including the one-loop corrections based on spherical collapse by Fosalba & Gaztañaga. Extended perturbation theory is confirmed on all scales.
Estimating means and variances: The comparative efficiency of composite and grab samples.
Brumelle, S; Nemetz, P; Casey, D
1984-03-01
This paper compares the efficiencies of two sampling techniques for estimating a population mean and variance. One procedure, called grab sampling, consists of collecting and analyzing one sample per period. The second procedure, called composite sampling, collectsn samples per period which are then pooled and analyzed as a single sample. We review the well known fact that composite sampling provides a superior estimate of the mean. However, it is somewhat surprising that composite sampling does not always generate a more efficient estimate of the variance. For populations with platykurtic distributions, grab sampling gives a more efficient estimate of the variance, whereas composite sampling is better for leptokurtic distributions. These conditions on kurtosis can be related to peakedness and skewness. For example, a necessary condition for composite sampling to provide a more efficient estimate of the variance is that the population density function evaluated at the mean (i.e.f(μ)) be greater than[Formula: see text]. If[Formula: see text], then a grab sample is more efficient. In spite of this result, however, composite sampling does provide a smaller estimate of standard error than does grab sampling in the context of estimating population means.
Heritability of semen traits in German Warmblood stallions.
Gottschalk, M; Sieme, H; Martinsson, G; Distl, O
2016-07-01
The objectives of the present study were to evaluate genetic parameters for semen quality traits of 241 fertile German Warmblood stallions regularly employed in artificial insemination (AI). Stallions were owned by the National Studs Celle and Warendorf in Germany. Semen traits analyzed were gel-free volume, sperm concentration, total number of sperm, progressive motility and total number of progressively motile sperm. Semen protocols from a total of 63,972 ejaculates were collected between the years 2001 and 2014 for the present analysis. A multivariate linear animal model was employed for estimation of additive genetic and permanent environmental variances among stallions and breeding values (EBVs) for semen traits. Heritabilities estimated for all German Warmblood stallions were highest for gel-free volume (h(2)=0.28) and lowest for total number of progressively motile sperm (h(2)=0.13). The additive genetic correlation among gel-free volume and sperm concentration was highly negative (rg=-0.76). Average reliabilities of EBVs were at 0.37-0.68 for the 241 stallions with own records. The inter-stallion variance explained between 33 and 61% of the trait variance, underlining the major impact of the individual stallion on semen quality traits analyzed here. Recording of semen traits from stallions employed in AI may be recommended because EBVs achieve sufficient accuracies to improve semen quality in future generations. Due to favorable genetic correlations, sperm concentration, total number of sperm and total number of progressively motile sperm may be increased simultaneously. Copyright © 2016 Elsevier B.V. All rights reserved.
Statistical classification techniques for engineering and climatic data samples
NASA Technical Reports Server (NTRS)
Temple, E. C.; Shipman, J. R.
1981-01-01
Fisher's sample linear discriminant function is modified through an appropriate alteration of the common sample variance-covariance matrix. The alteration consists of adding nonnegative values to the eigenvalues of the sample variance covariance matrix. The desired results of this modification is to increase the number of correct classifications by the new linear discriminant function over Fisher's function. This study is limited to the two-group discriminant problem.
NASA Astrophysics Data System (ADS)
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H
2015-11-30
We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.
Cloninger, C R; Rice, J; Reich, T
1979-01-01
A general linear model of combined polygenic-cultural inheritance is described. The model allows for phenotypic assortative mating, common environment, maternal and paternal effects, and genic-cultural correlation. General formulae for phenotypic correlation between family members in extended pedigrees are given for both primary and secondary assortative mating. A FORTRAN program BETA, available upon request, is used to provide maximum likelihood estimates of the parameters from reported correlations. American data about IQ and Burks' culture index are analyzed. Both cultural and genetic components of phenotypic variance are observed to make significant and substantial contributions to familial resemblance in IQ. The correlation between the environments of DZ twins is found to equal that of singleton sibs, not that of MZ twins. Burks' culture index is found to be an imperfect measure of midparent IQ rather than an index of home environment as previously assumed. Conditions under which the parameters of the model may be uniquely and precisely estimated are discussed. Interpretation of variance components in the presence of assortative mating and genic-cultural covariance is reviewed. A conservative, but robust, approach to the use of environmental indices is described. PMID:453202
Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis
ERIC Educational Resources Information Center
Williams, Ryan
2013-01-01
The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…
NASA Astrophysics Data System (ADS)
Benhalouche, Fatima Zohra; Karoui, Moussa Sofiane; Deville, Yannick; Ouamri, Abdelaziz
2015-10-01
In this paper, a new Spectral-Unmixing-based approach, using Nonnegative Matrix Factorization (NMF), is proposed to locally multi-sharpen hyperspectral data by integrating a Digital Surface Model (DSM) obtained from LIDAR data. In this new approach, the nature of the local mixing model is detected by using the local variance of the object elevations. The hyper/multispectral images are explored using small zones. In each zone, the variance of the object elevations is calculated from the DSM data in this zone. This variance is compared to a threshold value and the adequate linear/linearquadratic spectral unmixing technique is used in the considered zone to independently unmix hyperspectral and multispectral data, using an adequate linear/linear-quadratic NMF-based approach. The obtained spectral and spatial information thus respectively extracted from the hyper/multispectral images are then recombined in the considered zone, according to the selected mixing model. Experiments based on synthetic hyper/multispectral data are carried out to evaluate the performance of the proposed multi-sharpening approach and literature linear/linear-quadratic approaches used on the whole hyper/multispectral data. In these experiments, real DSM data are used to generate synthetic data containing linear and linear-quadratic mixed pixel zones. The DSM data are also used for locally detecting the nature of the mixing model in the proposed approach. Globally, the proposed approach yields good spatial and spectral fidelities for the multi-sharpened data and significantly outperforms the used literature methods.
Using structural equation modeling for network meta-analysis.
Tu, Yu-Kang; Wu, Yun-Chun
2017-07-14
Network meta-analysis overcomes the limitations of traditional pair-wise meta-analysis by incorporating all available evidence into a general statistical framework for simultaneous comparisons of several treatments. Currently, network meta-analyses are undertaken either within the Bayesian hierarchical linear models or frequentist generalized linear mixed models. Structural equation modeling (SEM) is a statistical method originally developed for modeling causal relations among observed and latent variables. As random effect is explicitly modeled as a latent variable in SEM, it is very flexible for analysts to specify complex random effect structure and to make linear and nonlinear constraints on parameters. The aim of this article is to show how to undertake a network meta-analysis within the statistical framework of SEM. We used an example dataset to demonstrate the standard fixed and random effect network meta-analysis models can be easily implemented in SEM. It contains results of 26 studies that directly compared three treatment groups A, B and C for prevention of first bleeding in patients with liver cirrhosis. We also showed that a new approach to network meta-analysis based on the technique of unrestricted weighted least squares (UWLS) method can also be undertaken using SEM. For both the fixed and random effect network meta-analysis, SEM yielded similar coefficients and confidence intervals to those reported in the previous literature. The point estimates of two UWLS models were identical to those in the fixed effect model but the confidence intervals were greater. This is consistent with results from the traditional pairwise meta-analyses. Comparing to UWLS model with common variance adjusted factor, UWLS model with unique variance adjusted factor has greater confidence intervals when the heterogeneity was larger in the pairwise comparison. The UWLS model with unique variance adjusted factor reflects the difference in heterogeneity within each comparison. SEM provides a very flexible framework for univariate and multivariate meta-analysis, and its potential as a powerful tool for advanced meta-analysis is still to be explored.
Holmes, John B; Dodds, Ken G; Lee, Michael A
2017-03-02
An important issue in genetic evaluation is the comparability of random effects (breeding values), particularly between pairs of animals in different contemporary groups. This is usually referred to as genetic connectedness. While various measures of connectedness have been proposed in the literature, there is general agreement that the most appropriate measure is some function of the prediction error variance-covariance matrix. However, obtaining the prediction error variance-covariance matrix is computationally demanding for large-scale genetic evaluations. Many alternative statistics have been proposed that avoid the computational cost of obtaining the prediction error variance-covariance matrix, such as counts of genetic links between contemporary groups, gene flow matrices, and functions of the variance-covariance matrix of estimated contemporary group fixed effects. In this paper, we show that a correction to the variance-covariance matrix of estimated contemporary group fixed effects will produce the exact prediction error variance-covariance matrix averaged by contemporary group for univariate models in the presence of single or multiple fixed effects and one random effect. We demonstrate the correction for a series of models and show that approximations to the prediction error matrix based solely on the variance-covariance matrix of estimated contemporary group fixed effects are inappropriate in certain circumstances. Our method allows for the calculation of a connectedness measure based on the prediction error variance-covariance matrix by calculating only the variance-covariance matrix of estimated fixed effects. Since the number of fixed effects in genetic evaluation is usually orders of magnitudes smaller than the number of random effect levels, the computational requirements for our method should be reduced.
Kawalilak, C E; Lanovaz, J L; Johnston, J D; Kontulainen, S A
2014-09-01
To assess the linearity and sex-specificity of damping coefficients used in a single-damper-model (SDM) when predicting impact forces during the worst-case falling scenario from fall heights up to 25 cm. Using 3-dimensional motion tracking and an integrated force plate, impact forces and impact velocities were assessed from 10 young adults (5 males; 5 females), falling from planted knees onto outstretched arms, from a random order of drop heights: 3, 5, 7, 10, 15, 20, and 25 cm. We assessed the linearity and sex-specificity between impact forces and impact velocities across all fall heights using analysis of variance linearity test and linear regression, respectively. Significance was accepted at P<0.05. Association between impact forces and impact velocities up to 25 cm was linear (P=0.02). Damping coefficients appeared sex-specific (males: 627 Ns/m, R(2)=0.70; females: 421 Ns/m; R(2)=0.81; sex combined: 532 Ns/m, R(2)=0.61). A linear damping coefficient used in the SDM proved valid for predicting impact forces from fall heights up to 25 cm. RESULTS suggested the use of sex-specific damping coefficients when estimating impact force using the SDM and calculating the factor-of-risk for wrist fractures.
Influence function based variance estimation and missing data issues in case-cohort studies.
Mark, S D; Katki, H
2001-12-01
Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.
Comment on Hoffman and Rovine (2007): SPSS MIXED can estimate models with heterogeneous variances.
Weaver, Bruce; Black, Ryan A
2015-06-01
Hoffman and Rovine (Behavior Research Methods, 39:101-117, 2007) have provided a very nice overview of how multilevel models can be useful to experimental psychologists. They included two illustrative examples and provided both SAS and SPSS commands for estimating the models they reported. However, upon examining the SPSS syntax for the models reported in their Table 3, we found no syntax for models 2B and 3B, both of which have heterogeneous error variances. Instead, there is syntax that estimates similar models with homogeneous error variances and a comment stating that SPSS does not allow heterogeneous errors. But that is not correct. We provide SPSS MIXED commands to estimate models 2B and 3B with heterogeneous error variances and obtain results nearly identical to those reported by Hoffman and Rovine in their Table 3. Therefore, contrary to the comment in Hoffman and Rovine's syntax file, SPSS MIXED can estimate models with heterogeneous error variances.
Qu, Long; Guennel, Tobias; Marshall, Scott L
2013-12-01
Following the rapid development of genome-scale genotyping technologies, genetic association mapping has become a popular tool to detect genomic regions responsible for certain (disease) phenotypes, especially in early-phase pharmacogenomic studies with limited sample size. In response to such applications, a good association test needs to be (1) applicable to a wide range of possible genetic models, including, but not limited to, the presence of gene-by-environment or gene-by-gene interactions and non-linearity of a group of marker effects, (2) accurate in small samples, fast to compute on the genomic scale, and amenable to large scale multiple testing corrections, and (3) reasonably powerful to locate causal genomic regions. The kernel machine method represented in linear mixed models provides a viable solution by transforming the problem into testing the nullity of variance components. In this study, we consider score-based tests by choosing a statistic linear in the score function. When the model under the null hypothesis has only one error variance parameter, our test is exact in finite samples. When the null model has more than one variance parameter, we develop a new moment-based approximation that performs well in simulations. Through simulations and analysis of real data, we demonstrate that the new test possesses most of the aforementioned characteristics, especially when compared to existing quadratic score tests or restricted likelihood ratio tests. © 2013, The International Biometric Society.
The influence of acceleration loading curve characteristics on traumatic brain injury.
Post, Andrew; Blaine Hoshizaki, T; Gilchrist, Michael D; Brien, Susan; Cusimano, Michael D; Marshall, Shawn
2014-03-21
To prevent brain trauma, understanding the mechanism of injury is essential. Once the mechanism of brain injury has been identified, prevention technologies could then be developed to aid in their prevention. The incidence of brain injury is linked to how the kinematics of a brain injury event affects the internal structures of the brain. As a result it is essential that an attempt be made to describe how the characteristics of the linear and rotational acceleration influence specific traumatic brain injury lesions. As a result, the purpose of this study was to examine the influence of the characteristics of linear and rotational acceleration pulses and how they account for the variance in predicting the outcome of TBI lesions, namely contusion, subdural hematoma (SDH), subarachnoid hemorrhage (SAH), and epidural hematoma (EDH) using a principal components analysis (PCA). Monorail impacts were conducted which simulated falls which caused the TBI lesions. From these reconstructions, the characteristics of the linear and rotational acceleration were determined and used for a PCA analysis. The results indicated that peak resultant acceleration variables did not account for any of the variance in predicting TBI lesions. The majority of the variance was accounted for by duration of the resultant and component linear and rotational acceleration. In addition, the components of linear and rotational acceleration characteristics on the x, y, and z axes accounted for the majority of the remainder of the variance after duration. Copyright © 2014 Elsevier Ltd. All rights reserved.
Generalized linear mixed models with varying coefficients for longitudinal data.
Zhang, Daowen
2004-03-01
The routinely assumed parametric functional form in the linear predictor of a generalized linear mixed model for longitudinal data may be too restrictive to represent true underlying covariate effects. We relax this assumption by representing these covariate effects by smooth but otherwise arbitrary functions of time, with random effects used to model the correlation induced by among-subject and within-subject variation. Due to the usually intractable integration involved in evaluating the quasi-likelihood function, the double penalized quasi-likelihood (DPQL) approach of Lin and Zhang (1999, Journal of the Royal Statistical Society, Series B61, 381-400) is used to estimate the varying coefficients and the variance components simultaneously by representing a nonparametric function by a linear combination of fixed effects and random effects. A scaled chi-squared test based on the mixed model representation of the proposed model is developed to test whether an underlying varying coefficient is a polynomial of certain degree. We evaluate the performance of the procedures through simulation studies and illustrate their application with Indonesian children infectious disease data.
Frequency domain analysis of errors in cross-correlations of ambient seismic noise
NASA Astrophysics Data System (ADS)
Liu, Xin; Ben-Zion, Yehuda; Zigone, Dimitri
2016-12-01
We analyse random errors (variances) in cross-correlations of ambient seismic noise in the frequency domain, which differ from previous time domain methods. Extending previous theoretical results on ensemble averaged cross-spectrum, we estimate confidence interval of stacked cross-spectrum of finite amount of data at each frequency using non-overlapping windows with fixed length. The extended theory also connects amplitude and phase variances with the variance of each complex spectrum value. Analysis of synthetic stationary ambient noise is used to estimate the confidence interval of stacked cross-spectrum obtained with different length of noise data corresponding to different number of evenly spaced windows of the same duration. This method allows estimating Signal/Noise Ratio (SNR) of noise cross-correlation in the frequency domain, without specifying filter bandwidth or signal/noise windows that are needed for time domain SNR estimations. Based on synthetic ambient noise data, we also compare the probability distributions, causal part amplitude and SNR of stacked cross-spectrum function using one-bit normalization or pre-whitening with those obtained without these pre-processing steps. Natural continuous noise records contain both ambient noise and small earthquakes that are inseparable from the noise with the existing pre-processing steps. Using probability distributions of random cross-spectrum values based on the theoretical results provides an effective way to exclude such small earthquakes, and additional data segments (outliers) contaminated by signals of different statistics (e.g. rain, cultural noise), from continuous noise waveforms. This technique is applied to constrain values and uncertainties of amplitude and phase velocity of stacked noise cross-spectrum at different frequencies, using data from southern California at both regional scale (˜35 km) and dense linear array (˜20 m) across the plate-boundary faults. A block bootstrap resampling method is used to account for temporal correlation of noise cross-spectrum at low frequencies (0.05-0.2 Hz) near the ocean microseismic peaks.
Heidaritabar, M; Wolc, A; Arango, J; Zeng, J; Settar, P; Fulton, J E; O'Sullivan, N P; Bastiaansen, J W M; Fernando, R L; Garrick, D J; Dekkers, J C M
2016-10-01
Most genomic prediction studies fit only additive effects in models to estimate genomic breeding values (GEBV). However, if dominance genetic effects are an important source of variation for complex traits, accounting for them may improve the accuracy of GEBV. We investigated the effect of fitting dominance and additive effects on the accuracy of GEBV for eight egg production and quality traits in a purebred line of brown layers using pedigree or genomic information (42K single-nucleotide polymorphism (SNP) panel). Phenotypes were corrected for the effect of hatch date. Additive and dominance genetic variances were estimated using genomic-based [genomic best linear unbiased prediction (GBLUP)-REML and BayesC] and pedigree-based (PBLUP-REML) methods. Breeding values were predicted using a model that included both additive and dominance effects and a model that included only additive effects. The reference population consisted of approximately 1800 animals hatched between 2004 and 2009, while approximately 300 young animals hatched in 2010 were used for validation. Accuracy of prediction was computed as the correlation between phenotypes and estimated breeding values of the validation animals divided by the square root of the estimate of heritability in the whole population. The proportion of dominance variance to total phenotypic variance ranged from 0.03 to 0.22 with PBLUP-REML across traits, from 0 to 0.03 with GBLUP-REML and from 0.01 to 0.05 with BayesC. Accuracies of GEBV ranged from 0.28 to 0.60 across traits. Inclusion of dominance effects did not improve the accuracy of GEBV, and differences in their accuracies between genomic-based methods were small (0.01-0.05), with GBLUP-REML yielding higher prediction accuracies than BayesC for egg production, egg colour and yolk weight, while BayesC yielded higher accuracies than GBLUP-REML for the other traits. In conclusion, fitting dominance effects did not impact accuracy of genomic prediction of breeding values in this population. © 2016 Blackwell Verlag GmbH.
NASA Astrophysics Data System (ADS)
Shao, Yuehong; Wu, Junmei; Ye, Jinyin; Liu, Yonghe
2015-08-01
This study investigates frequency analysis and its spatiotemporal characteristics of precipitation extremes based on annual maximum of daily precipitation (AMP) data of 753 observation stations in China during the period 1951-2010. Several statistical methods including L-moments, Mann-Kendall test (MK test), Student's t test ( t test) and analysis of variance ( F-test) are used to study different statistical properties related to frequency and spatiotemporal characteristics of precipitation extremes. The results indicate that the AMP series of most sites have no linear trends at 90 % confidence level, but there is a distinctive decrease trend in Beijing-Tianjin-Tangshan region. The analysis of abrupt changes shows that there are no significant changes in most sites, and no distinctive regional patterns within the mutation sites either. An important innovation different from the previous studies is the shift in the mean and the variance which are also studied in this paper in order to further analyze the changes of strong and weak precipitation extreme events. The shift analysis shows that we should pay more attention to the drought in North China and to the flood control and drought in South China, especially to those regions that have no clear trend and have a significant shift in the variance. More important, this study conducts the comprehensive analysis of a complete set of quantile estimates and its spatiotemporal characteristic in China. Spatial distribution of quantile estimation based on the AMP series demonstrated that the values gradually increased from the Northwest to the Southeast with the increment of duration and return period, while the increasing rate of estimation is smooth in the arid and semiarid region and is rapid in humid region. Frequency estimates of 50-year return period are in agreement with the maximum observations of AMP series in the most stations, which can provide more quantitative and scientific basis for decision making.
Heritability of myopia and ocular biometrics in Koreans: the healthy twin study.
Kim, Myung Hun; Zhao, Di; Kim, Woori; Lim, Dong-Hui; Song, Yun-Mi; Guallar, Eliseo; Cho, Juhee; Sung, Joohon; Chung, Eui-Sang; Chung, Tae-Young
2013-05-01
To estimate the heritabilities of myopia and ocular biometrics among different family types among a Korean population. We studied 1508 adults in the Healthy Twin Study. Spherical equivalent, axial length, anterior chamber depth, and corneal astigmatism were measured by refraction, corneal topography, and A-scan ultrasonography. To see the degree of resemblance among different types of family relationships, intraclass correlation coefficients (ICC) were calculated. Variance-component methods were applied to estimate the genetic contributions to eye phenotypes as heritability based on the maximum likelihood estimation. Narrow sense heritability was calculated as the proportion of the total phenotypic variance explained by additive genetic effects, and linear and nonlinear effects of age, sex, and interactions between age and sex were adjusted. A total of 240 monozygotic twin pairs, 45 dizygotic twin pairs, and 938 singleton adult family members who were first-degree relatives of twins in 345 families were included in the study. ICCs for spherical equivalent from monozygotic twins, pooled first-degree pairs, and spouse pairs were 0.83, 0.34, and 0.20, respectively. The ICCs of other ocular biometrics were also significantly higher in monozygotic twins compared with other relative pairs, with greater consistency and conformity. The estimated narrow sense heritability (95% confidence interval) was 0.78 (0.71-0.84) for spherical equivalent; 0.86 (0.82-0.90) for axial length; 0.83 (0.76-0.91) for anterior chamber depth; and 0.70 (0.63-0.77) for corneal astigmatism. The estimated heritability of spherical equivalent and ocular biometrics in the Korean population suggests the compelling evidence that all traits are highly heritable.
Kalman filter for statistical monitoring of forest cover across sub-continental regions [Symposium
Raymond L. Czaplewski
1991-01-01
The Kalman filter is a generalization of the composite estimator. The univariate composite estimate combines 2 prior estimates of population parameter with a weighted average where the scalar weight is inversely proportional to the variances. The composite estimator is a minimum variance estimator that requires no distributional assumptions other than estimates of the...
A de-noising method using the improved wavelet threshold function based on noise variance estimation
NASA Astrophysics Data System (ADS)
Liu, Hui; Wang, Weida; Xiang, Changle; Han, Lijin; Nie, Haizhao
2018-01-01
The precise and efficient noise variance estimation is very important for the processing of all kinds of signals while using the wavelet transform to analyze signals and extract signal features. In view of the problem that the accuracy of traditional noise variance estimation is greatly affected by the fluctuation of noise values, this study puts forward the strategy of using the two-state Gaussian mixture model to classify the high-frequency wavelet coefficients in the minimum scale, which takes both the efficiency and accuracy into account. According to the noise variance estimation, a novel improved wavelet threshold function is proposed by combining the advantages of hard and soft threshold functions, and on the basis of the noise variance estimation algorithm and the improved wavelet threshold function, the research puts forth a novel wavelet threshold de-noising method. The method is tested and validated using random signals and bench test data of an electro-mechanical transmission system. The test results indicate that the wavelet threshold de-noising method based on the noise variance estimation shows preferable performance in processing the testing signals of the electro-mechanical transmission system: it can effectively eliminate the interference of transient signals including voltage, current, and oil pressure and maintain the dynamic characteristics of the signals favorably.
NASA Astrophysics Data System (ADS)
Uhlemann, C.; Codis, S.; Hahn, O.; Pichon, C.; Bernardeau, F.
2017-08-01
The analytical formalism to obtain the probability distribution functions (PDFs) of spherically averaged cosmic densities and velocity divergences in the mildly non-linear regime is presented. A large-deviation principle is applied to those cosmic fields assuming their most likely dynamics in spheres is set by the spherical collapse model. We validate our analytical results using state-of-the-art dark matter simulations with a phase-space resolved velocity field finding a 2 per cent level agreement for a wide range of velocity divergences and densities in the mildly non-linear regime (˜10 Mpc h-1 at redshift zero), usually inaccessible to perturbation theory. From the joint PDF of densities and velocity divergences measured in two concentric spheres, we extract with the same accuracy velocity profiles and conditional velocity PDF subject to a given over/underdensity that are of interest to understand the non-linear evolution of velocity flows. Both PDFs are used to build a simple but accurate maximum likelihood estimator for the redshift evolution of the variance of both the density and velocity divergence fields, which have smaller relative errors than their sample variances when non-linearities appear. Given the dependence of the velocity divergence on the growth rate, there is a significant gain in using the full knowledge of both PDFs to derive constraints on the equation of state-of-dark energy. Thanks to the insensitivity of the velocity divergence to bias, its PDF can be used to obtain unbiased constraints on the growth of structures (σ8, f) or it can be combined with the galaxy density PDF to extract bias parameters.
Baird, Rachel; Maxwell, Scott E
2016-06-01
Time-varying predictors in multilevel models are a useful tool for longitudinal research, whether they are the research variable of interest or they are controlling for variance to allow greater power for other variables. However, standard recommendations to fix the effect of time-varying predictors may make an assumption that is unlikely to hold in reality and may influence results. A simulation study illustrates that treating the time-varying predictor as fixed may allow analyses to converge, but the analyses have poor coverage of the true fixed effect when the time-varying predictor has a random effect in reality. A second simulation study shows that treating the time-varying predictor as random may have poor convergence, except when allowing negative variance estimates. Although negative variance estimates are uninterpretable, results of the simulation show that estimates of the fixed effect of the time-varying predictor are as accurate for these cases as for cases with positive variance estimates, and that treating the time-varying predictor as random and allowing negative variance estimates performs well whether the time-varying predictor is fixed or random in reality. Because of the difficulty of interpreting negative variance estimates, 2 procedures are suggested for selection between fixed-effect and random-effect models: comparing between fixed-effect and constrained random-effect models with a likelihood ratio test or fitting a fixed-effect model when an unconstrained random-effect model produces negative variance estimates. The performance of these 2 procedures is compared. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Bootstrap Estimation and Testing for Variance Equality.
ERIC Educational Resources Information Center
Olejnik, Stephen; Algina, James
The purpose of this study was to develop a single procedure for comparing population variances which could be used for distribution forms. Bootstrap methodology was used to estimate the variability of the sample variance statistic when the population distribution was normal, platykurtic and leptokurtic. The data for the study were generated and…
NASA Astrophysics Data System (ADS)
Pleniou, Magdalini; Koutsias, Nikos
2013-05-01
The aim of our study was to explore the spectral properties of fire-scorched (burned) and non fire-scorched (vegetation) areas, as well as areas with different burn/vegetation ratios, using a multisource multiresolution satellite data set. A case study was undertaken following a very destructive wildfire that occurred in Parnitha, Greece, July 2007, for which we acquired satellite images from LANDSAT, ASTER, and IKONOS. Additionally, we created spatially degraded satellite data over a range of coarser resolutions using resampling techniques. The panchromatic (1 m) and multispectral component (4 m) of IKONOS were merged using the Gram-Schmidt spectral sharpening method. This very high-resolution imagery served as the basis to estimate the cover percentage of burned areas, bare land and vegetation at pixel level, by applying the maximum likelihood classification algorithm. Finally, multiple linear regression models were fit to estimate each land-cover fraction as a function of surface reflectance values of the original and the spatially degraded satellite images. The main findings of our research were: (a) the Near Infrared (NIR) and Short-wave Infrared (SWIR) are the most important channels to estimate the percentage of burned area, whereas the NIR and red channels are the most important to estimate the percentage of vegetation in fire-affected areas; (b) when the bi-spectral space consists only of NIR and SWIR, then the NIR ground reflectance value plays a more significant role in estimating the percent of burned areas, and the SWIR appears to be more important in estimating the percent of vegetation; and (c) semi-burned areas comprising 45-55% burned area and 45-55% vegetation are spectrally closer to burned areas in the NIR channel, whereas those areas are spectrally closer to vegetation in the SWIR channel. These findings, at least partially, are attributed to the fact that: (i) completely burned pixels present low variance in the NIR and high variance in the SWIR, whereas the opposite is observed in completely vegetated areas where higher variance is observed in the NIR and lower variance in the SWIR, and (ii) bare land modifies the spectral signal of burned areas more than the spectral signal of vegetated areas in the NIR, while the opposite is observed in SWIR region of the spectrum where the bare land modifies the spectral signal of vegetation more than the burned areas because the bare land and the vegetation are spectrally more similar in the NIR, and the bare land and burned areas are spectrally more similar in the SWIR.
Empirical Bayes estimation of undercount in the decennial census.
Cressie, N
1989-12-01
Empirical Bayes methods are used to estimate the extent of the undercount at the local level in the 1980 U.S. census. "Grouping of like subareas from areas such as states, counties, and so on into strata is a useful way of reducing the variance of undercount estimators. By modeling the subareas within a stratum to have a common mean and variances inversely proportional to their census counts, and by taking into account sampling of the areas (e.g., by dual-system estimation), empirical Bayes estimators that compromise between the (weighted) stratum average and the sample value can be constructed. The amount of compromise is shown to depend on the relative importance of stratum variance to sampling variance. These estimators are evaluated at the state level (51 states, including Washington, D.C.) and stratified on race/ethnicity (3 strata) using data from the 1980 postenumeration survey (PEP 3-8, for the noninstitutional population)." excerpt
Fukayama, Osamu; Taniguchi, Noriyuki; Suzuki, Takafumi; Mabuchi, Kunihiko
2008-01-01
An online brain-machine interface (BMI) in the form of a small vehicle, the 'RatCar,' has been developed. A rat had neural electrodes implanted in its primary motor cortex and basal ganglia regions to continuously record neural signals. Then, a linear state space model represents a correlation between the recorded neural signals and locomotion states (i.e., moving velocity and azimuthal variances) of the rat. The model parameters were set so as to minimize estimation errors, and the locomotion states were estimated from neural firing rates using a Kalman filter algorithm. The results showed a small oscillation to achieve smooth control of the vehicle in spite of fluctuating firing rates with noises applied to the model. Major variation of the model variables converged in a first 30 seconds of the experiments and lasted for the entire one hour session.
Komsta, Łukasz; Stępkowska, Barbara; Skibiński, Robert
2017-02-03
The eluotropic strength on thin-layer silica plates was investigated for 20 chromatographic grade solvents available in current market. 35 model compounds were used as test subjects in the investigation. The use of modern mixture screening design allowed to estimate each solvent as a separate elution coefficient with an acceptable error of estimation (0.0913 of R M value). Additional bootstrapping technique was used to check the distribution and uncertainty of eluotropic estimates, proving very similar confidence intervals to linear regression. Principal component analysis proved that the only one parameter (mean eluotropic strength) is satisfactory to describe the solvent property, as it explains almost 90% of variance of retention. The obtained eluotropic data can be good appendix to earlier published results and their values can be interpreted in context of R M differences. Copyright © 2017 Elsevier B.V. All rights reserved.
Komsta, Łukasz; Stępkowska, Barbara; Skibiński, Robert
2017-01-04
The eluotropic strength on thin-layer silica plates was investigated for 20 chromatographic grade solvents available in current market. 35 model compounds were used as test subjects in the investigation. The use of modern mixture screening design allowed to estimate each solvent as a separate elution coefficient with an acceptable error of estimation (0.0913 of R M value). Additional bootstrapping technique was used to check the distribution and uncertainty of eluotropic estimates, proving very similar confidence intervals to linear regression. Principal component analysis proved that the only one parameter (mean eluotropic strength) is satisfactory to describe the solvent property, as it explains almost 90% of variance of retention. The obtained eluotropic data can be good appendix to earlier published results and their values can be interpreted in context of R M differences. Copyright © 2017 Elsevier B.V. All rights reserved.
Gray, Brian R.; Gitzen, Robert A.; Millspaugh, Joshua J.; Cooper, Andrew B.; Licht, Daniel S.
2012-01-01
Variance components may play multiple roles (cf. Cox and Solomon 2003). First, magnitudes and relative magnitudes of the variances of random factors may have important scientific and management value in their own right. For example, variation in levels of invasive vegetation among and within lakes may suggest causal agents that operate at both spatial scales – a finding that may be important for scientific and management reasons. Second, variance components may also be of interest when they affect precision of means and covariate coefficients. For example, variation in the effect of water depth on the probability of aquatic plant presence in a study of multiple lakes may vary by lake. This variation will affect the precision of the average depth-presence association. Third, variance component estimates may be used when designing studies, including monitoring programs. For example, to estimate the numbers of years and of samples per year required to meet long-term monitoring goals, investigators need estimates of within and among-year variances. Other chapters in this volume (Chapters 7, 8, and 10) as well as extensive external literature outline a framework for applying estimates of variance components to the design of monitoring efforts. For example, a series of papers with an ecological monitoring theme examined the relative importance of multiple sources of variation, including variation in means among sites, years, and site-years, for the purposes of temporal trend detection and estimation (Larsen et al. 2004, and references therein).
Estimation of population size using open capture-recapture models
McDonald, T.L.; Amstrup, Steven C.
2001-01-01
One of the most important needs for wildlife managers is an accurate estimate of population size. Yet, for many species, including most marine species and large mammals, accurate and precise estimation of numbers is one of the most difficult of all research challenges. Open-population capture-recapture models have proven useful in many situations to estimate survival probabilities but typically have not been used to estimate population size. We show that open-population models can be used to estimate population size by developing a Horvitz-Thompson-type estimate of population size and an estimator of its variance. Our population size estimate keys on the probability of capture at each trap occasion and therefore is quite general and can be made a function of external covariates measured during the study. Here we define the estimator and investigate its bias, variance, and variance estimator via computer simulation. Computer simulations make extensive use of real data taken from a study of polar bears (Ursus maritimus) in the Beaufort Sea. The population size estimator is shown to be useful because it was negligibly biased in all situations studied. The variance estimator is shown to be useful in all situations, but caution is warranted in cases of extreme capture heterogeneity.
A python framework for environmental model uncertainty analysis
White, Jeremy; Fienen, Michael N.; Doherty, John E.
2016-01-01
We have developed pyEMU, a python framework for Environmental Modeling Uncertainty analyses, open-source tool that is non-intrusive, easy-to-use, computationally efficient, and scalable to highly-parameterized inverse problems. The framework implements several types of linear (first-order, second-moment (FOSM)) and non-linear uncertainty analyses. The FOSM-based analyses can also be completed prior to parameter estimation to help inform important modeling decisions, such as parameterization and objective function formulation. Complete workflows for several types of FOSM-based and non-linear analyses are documented in example notebooks implemented using Jupyter that are available in the online pyEMU repository. Example workflows include basic parameter and forecast analyses, data worth analyses, and error-variance analyses, as well as usage of parameter ensemble generation and management capabilities. These workflows document the necessary steps and provides insights into the results, with the goal of educating users not only in how to apply pyEMU, but also in the underlying theory of applied uncertainty quantification.
Low-dimensional Representation of Error Covariance
NASA Technical Reports Server (NTRS)
Tippett, Michael K.; Cohn, Stephen E.; Todling, Ricardo; Marchesin, Dan
2000-01-01
Ensemble and reduced-rank approaches to prediction and assimilation rely on low-dimensional approximations of the estimation error covariances. Here stability properties of the forecast/analysis cycle for linear, time-independent systems are used to identify factors that cause the steady-state analysis error covariance to admit a low-dimensional representation. A useful measure of forecast/analysis cycle stability is the bound matrix, a function of the dynamics, observation operator and assimilation method. Upper and lower estimates for the steady-state analysis error covariance matrix eigenvalues are derived from the bound matrix. The estimates generalize to time-dependent systems. If much of the steady-state analysis error variance is due to a few dominant modes, the leading eigenvectors of the bound matrix approximate those of the steady-state analysis error covariance matrix. The analytical results are illustrated in two numerical examples where the Kalman filter is carried to steady state. The first example uses the dynamics of a generalized advection equation exhibiting nonmodal transient growth. Failure to observe growing modes leads to increased steady-state analysis error variances. Leading eigenvectors of the steady-state analysis error covariance matrix are well approximated by leading eigenvectors of the bound matrix. The second example uses the dynamics of a damped baroclinic wave model. The leading eigenvectors of a lowest-order approximation of the bound matrix are shown to approximate well the leading eigenvectors of the steady-state analysis error covariance matrix.
Hevesi, Joseph A.; Flint, Alan L.; Istok, Jonathan D.
1992-01-01
Values of average annual precipitation (AAP) may be important for hydrologic characterization of a potential high-level nuclear-waste repository site at Yucca Mountain, Nevada. Reliable measurements of AAP are sparse in the vicinity of Yucca Mountain, and estimates of AAP were needed for an isohyetal mapping over a 2600-square-mile watershed containing Yucca Mountain. Estimates were obtained with a multivariate geostatistical model developed using AAP and elevation data from a network of 42 precipitation stations in southern Nevada and southeastern California. An additional 1531 elevations were obtained to improve estimation accuracy. Isohyets representing estimates obtained using univariate geostatistics (kriging) defined a smooth and continuous surface. Isohyets representing estimates obtained using multivariate geostatistics (cokriging) defined an irregular surface that more accurately represented expected local orographic influences on AAP. Cokriging results included a maximum estimate within the study area of 335 mm at an elevation of 7400 ft, an average estimate of 157 mm for the study area, and an average estimate of 172 mm at eight locations in the vicinity of the potential repository site. Kriging estimates tended to be lower in comparison because the increased AAP expected for remote mountainous topography was not adequately represented by the available sample. Regression results between cokriging estimates and elevation were similar to regression results between measured AAP and elevation. The position of the cokriging 250-mm isohyet relative to the boundaries of pinyon pine and juniper woodlands provided indirect evidence of improved estimation accuracy because the cokriging result agreed well with investigations by others concerning the relationship between elevation, vegetation, and climate in the Great Basin. Calculated estimation variances were also mapped and compared to evaluate improvements in estimation accuracy. Cokriging estimation variances were reduced by an average of 54% relative to kriging variances within the study area. Cokriging reduced estimation variances at the potential repository site by 55% relative to kriging. The usefulness of an existing network of stations for measuring AAP within the study area was evaluated using cokriging variances, and twenty additional stations were located for the purpose of improving the accuracy of future isohyetal mappings. Using the expanded network of stations, the maximum cokriging estimation variance within the study area was reduced by 78% relative to the existing network, and the average estimation variance was reduced by 52%.
1982-06-01
observation in our framework is the pair (y,x) with x considered given. The influence function for 52 at the Gaussian distribution with mean xB and variance...3/2 - (1+22)o2 2) 1+2x\\/2 x’) 2(3-9) (1+2X) This influence function is bounded in the residual y-xS, and redescends to an asymptote greater than...version of the influence function for B at the Gaussian distribution, given the x. and x, is defined as the normalized differenceJ (see Barnett and
On estimating the effects of clock instability with flicker noise characteristics
NASA Technical Reports Server (NTRS)
Wu, S. C.
1981-01-01
A scheme for flicker noise generation is given. The second approach is that of successive segmentation: A clock fluctuation is represented by 2N piecewise linear segments and then converted into a summation of N+1 triangular pulse train functions. The statistics of the clock instability are then formulated in terms of two sample variances at N+1 specified averaging times. The summation converges very rapidly that a value of N 6 is seldom necessary. An application to radio interferometric geodesy shows excellent agreement between the two approaches. Limitations to and the relative merits of the two approaches are discussed.
Simple Parametric Model for Intensity Calibration of Cassini Composite Infrared Spectrometer Data
NASA Technical Reports Server (NTRS)
Brasunas, J.; Mamoutkine, A.; Gorius, N.
2016-01-01
Accurate intensity calibration of a linear Fourier-transform spectrometer typically requires the unknown science target and the two calibration targets to be acquired under identical conditions. We present a simple model suitable for vector calibration that enables accurate calibration via adjustments of measured spectral amplitudes and phases when these three targets are recorded at different detector or optics temperatures. Our model makes calibration more accurate both by minimizing biases due to changing instrument temperatures that are always present at some level and by decreasing estimate variance through incorporating larger averages of science and calibration interferogram scans.
Nejaim, Yuri; Aps, Johan K M; Groppo, Francisco Carlos; Haiter Neto, Francisco
2018-06-01
The purpose of this article was to evaluate the pharyngeal space volume, and the size and shape of the mandible and the hyoid bone, as well as their relationships, in patients with different facial types and skeletal classes. Furthermore, we estimated the volume of the pharyngeal space with a formula using only linear measurements. A total of 161 i-CAT Next Generation (Imaging Sciences International, Hatfield, Pa) cone-beam computed tomography images (80 men, 81 women; ages, 21-58 years; mean age, 27 years) were retrospectively studied. Skeletal class and facial type were determined for each patient from multiplanar reconstructions using the NemoCeph software (Nemotec, Madrid, Spain). Linear and angular measurements were performed using 3D imaging software (version 3.4.3; Carestream Health, Rochester, NY), and volumetric analysis of the pharyngeal space was carried out with ITK-SNAP (version 2.4.0; Cognitica, Philadelphia, Pa) segmentation software. For the statistics, analysis of variance and the Tukey test with a significance level of 0.05, Pearson correlation, and linear regression were used. The pharyngeal space volume, when correlated with mandible and hyoid bone linear and angular measurements, showed significant correlations with skeletal class or facial type. The linear regression performed to estimate the volume of the pharyngeal space showed an R of 0.92 and an adjusted R 2 of 0.8362. There were significant correlations between pharyngeal space volume, and the mandible and hyoid bone measurements, suggesting that the stomatognathic system should be evaluated in an integral and nonindividualized way. Furthermore, it was possible to develop a linear regression model, resulting in a useful formula for estimating the volume of the pharyngeal space. Copyright © 2018 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Konietschke, Frank; Libiger, Ondrej; Hothorn, Ludwig A
2012-01-01
Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.
Improved importance sampling technique for efficient simulation of digital communication systems
NASA Technical Reports Server (NTRS)
Lu, Dingqing; Yao, Kung
1988-01-01
A new, improved importance sampling (IIS) approach to simulation is considered. Some basic concepts of IS are introduced, and detailed evolutions of simulation estimation variances for Monte Carlo (MC) and IS simulations are given. The general results obtained from these evolutions are applied to the specific previously known conventional importance sampling (CIS) technique and the new IIS technique. The derivation for a linear system with no signal random memory is considered in some detail. For the CIS technique, the optimum input scaling parameter is found, while for the IIS technique, the optimum translation parameter is found. The results are generalized to a linear system with memory and signals. Specific numerical and simulation results are given which show the advantages of CIS over MC and IIS over CIS for simulations of digital communications systems.
Kim, Minjung; Lamont, Andrea E.; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M. Lee
2015-01-01
Regression mixture models are a novel approach for modeling heterogeneous effects of predictors on an outcome. In the model building process residual variances are often disregarded and simplifying assumptions made without thorough examination of the consequences. This simulation study investigated the impact of an equality constraint on the residual variances across latent classes. We examine the consequence of constraining the residual variances on class enumeration (finding the true number of latent classes) and parameter estimates under a number of different simulation conditions meant to reflect the type of heterogeneity likely to exist in applied analyses. Results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted estimated class sizes and showed the potential to greatly impact parameter estimates in each class. Results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions were made. PMID:26139512
Terry, Douglas P; Puente, Antonio N; Brown, Courtney L; Faraco, Carlos C; Miller, L Stephen
2013-01-01
The personality traits Openness to experience and Neuroticism of the five-factor model have previously been associated with memory performance in nondemented older adults, but this relationship has not been investigated in samples with memory impairment. Our examination of 50 community-dwelling older adults (29 cognitively intact; 21 with questionable dementia as determined by the Clinical Dementia Rating Scale) showed that demographic variables (age, years of education, gender, and estimated premorbid IQ) and current depressive symptoms explained a significant amount of variance of Repeatable Battery of Neuropsychological Status Delayed Memory (adjusted R (2) = 0.23). After controlling for these variables, a measure of global cognitive status further explained a significant portion of variance in memory performance (ΔR(2) = 0.13; adjusted R(2) = 0.36; p < .01). Finally, adding Openness to this hierarchical linear regression model explained a significant additional portion of variance (ΔR(2) = 0.08; adjusted R(2) = 0.44; p < .01) but adding Neuroticism did not explain any additional variance. This significant relationship between Openness and better memory performance above and beyond one's cognitive status and demographic variables may suggest that a lifelong pattern of involvement in new cognitive activities could be preserved in old age or protect from memory decline. This study suggests that personality may be a powerful predictor of memory ability and clinically useful in this heterogeneous population.
NASA Astrophysics Data System (ADS)
Graf, Alexander; van de Boer, Anneke; Schüttemeyer, Dirk; Moene, Arnold; Vereecken, Harry
2013-04-01
The displacement height d and roughness length z0 are parameters of the logarithmic wind profile and as such these are characteristics of the surface, that are required in a multitude of meteorological modeling applications. Classically, both parameters are estimated from multi-level measurements of wind speed over a terrain sufficiently homogeneous to avoid footprint-induced differences between the levels. As a rule-of thumb, d of a dense, uniform crop or forest canopy is 2/3 to 3/4 of the canopy height h, and z0 about 10% of canopy height in absence of any d. However, the uncertainty of this rule-of-thumb becomes larger if the surface of interest is not "dense and uniform", in which case a site-specific determination is required again. By means of the eddy covariance method, alternative possibilities to determine z0 and d have become available. Various authors report robust results if either several levels of sonic anemometer measurements, or one such level combined with a classic wind profile is used to introduce direct knowledge on the friction velocity into the estimation procedure. At the same time, however, the eddy covariance method to measure various fluxes has superseded the profile method, leaving many current stations without a wind speed profile with enough levels sufficiently far above the canopy to enable the classic estimation of z0 and d. From single-level eddy covariance measurements at one point in time, only one parameter can be estimated, usually z0 while d is assumed to be known. Even so, results tend to scatter considerably. However, it has been pointed out, that the use of multiple points in time providing different stability conditions can enable the estimation of both parameters, if they are assumed constant over the time period regarded. These methods either rely on flux-variance similarity (Weaver 1990 and others following), or on the integrated universal function for momentum (Martano 2000 and others following). In both cases, iterations over the range of possible d values are necessary. We extended this set of methods by a non-iterative, regression based approach. Only a stability range of data is used in which the universal function is known to be approximately linear. Then, various types of multiple linear regression can be used to relate the terms of the logarithmic wind profile equation to each other, and derive z0 and d from the regression parameters. Two examples each of the two existing iterative approaches, and the new non-iterative one are compared to each other and to plausibility limits in three different agricultural crops. The study contains periods of growth as well as of constant crop height, also allowing for an examination of the relations between z0, d, and canopy height. Results indicate that estimated z0 values, even in absence of prescribed d values, are fairly robust, plausible and consistent across all methods. The largest deviations are produced by the two flux-variance similarity based methods. Estimates of d, in contrast, can be subject to implausible deviations with all methods, even after quality-filtering of input data. Again, the largest deviations occur with flux-variance similarity based methods. Ensemble averaging between all methods can reduce this problem, offering a potentially useful way of estimating d at more complex sites where the rule-of-thumb cannot be applied easily. Martano P (2000): Estimation of surface roughness length and displacement height from single-level sonic anemometer data. Journal of Applied Meteorology 39:708-715. Weaver HL (1990): Temperature and Humidity flux-variance relations determined by one-dimensional eddy correlation. Boundary-Layer Meteorology 53:77-91.
Relationships of Measurement Error and Prediction Error in Observed-Score Regression
ERIC Educational Resources Information Center
Moses, Tim
2012-01-01
The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values
Tong, Tiejun; Feng, Zeny; Hilton, Julia S.; Zhao, Hongyu
2013-01-01
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 − λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. PMID:24078762
Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values.
Tong, Tiejun; Feng, Zeny; Hilton, Julia S; Zhao, Hongyu
2013-01-01
Estimating the proportion of true null hypotheses, π 0 , has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π 0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π 0 by incorporating the distribution pattern of the observed p -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harkness, A. L.
1977-09-01
Nine elements from each batch of fuel elements manufactured for the EBR-II reactor have been analyzed for /sup 235/U content by NDA methods. These values, together with those of the manufacturer, are used to estimate the product variance and the variances of the two measuring methods. These variances are compared with the variances computed from the stipulations of the contract. A method is derived for resolving the several variances into their within-batch and between-batch components. Some of these variance components have also been estimated by independent and more familiar conventional methods for comparison.
Vitezica, Zulma G; Varona, Luis; Elsen, Jean-Michel; Misztal, Ignacy; Herring, William; Legarra, Andrès
2016-01-29
Most developments in quantitative genetics theory focus on the study of intra-breed/line concepts. With the availability of massive genomic information, it becomes necessary to revisit the theory for crossbred populations. We propose methods to construct genomic covariances with additive and non-additive (dominance) inheritance in the case of pure lines and crossbred populations. We describe substitution effects and dominant deviations across two pure parental populations and the crossbred population. Gene effects are assumed to be independent of the origin of alleles and allelic frequencies can differ between parental populations. Based on these assumptions, the theoretical variance components (additive and dominant) are obtained as a function of marker effects and allelic frequencies. The additive genetic variance in the crossbred population includes the biological additive and dominant effects of a gene and a covariance term. Dominance variance in the crossbred population is proportional to the product of the heterozygosity coefficients of both parental populations. A genomic BLUP (best linear unbiased prediction) equivalent model is presented. We illustrate this approach by using pig data (two pure lines and their cross, including 8265 phenotyped and genotyped sows). For the total number of piglets born, the dominance variance in the crossbred population represented about 13 % of the total genetic variance. Dominance variation is only marginally important for litter size in the crossbred population. We present a coherent marker-based model that includes purebred and crossbred data and additive and dominant actions. Using this model, it is possible to estimate breeding values, dominant deviations and variance components in a dataset that comprises data on purebred and crossbred individuals. These methods can be exploited to plan assortative mating in pig, maize or other species, in order to generate superior crossbred individuals in terms of performance.
NASA Astrophysics Data System (ADS)
Lee, K. C.
2013-02-01
Multifractional Brownian motions have become popular as flexible models in describing real-life signals of high-frequency features in geoscience, microeconomics, and turbulence, to name a few. The time-changing Hurst exponent, which describes regularity levels depending on time measurements, and variance, which relates to an energy level, are two parameters that characterize multifractional Brownian motions. This research suggests a combined method of estimating the time-changing Hurst exponent and variance using the local variation of sampled paths of signals. The method consists of two phases: initially estimating global variance and then accurately estimating the time-changing Hurst exponent. A simulation study shows its performance in estimation of the parameters. The proposed method is applied to characterization of atmospheric stability in which descriptive statistics from the estimated time-changing Hurst exponent and variance classify stable atmosphere flows from unstable ones.
Optimal design criteria - prediction vs. parameter estimation
NASA Astrophysics Data System (ADS)
Waldl, Helmut
2014-05-01
G-optimality is a popular design criterion for optimal prediction, it tries to minimize the kriging variance over the whole design region. A G-optimal design minimizes the maximum variance of all predicted values. If we use kriging methods for prediction it is self-evident to use the kriging variance as a measure of uncertainty for the estimates. Though the computation of the kriging variance and even more the computation of the empirical kriging variance is computationally very costly and finding the maximum kriging variance in high-dimensional regions can be time demanding such that we cannot really find the G-optimal design with nowadays available computer equipment in practice. We cannot always avoid this problem by using space-filling designs because small designs that minimize the empirical kriging variance are often non-space-filling. D-optimality is the design criterion related to parameter estimation. A D-optimal design maximizes the determinant of the information matrix of the estimates. D-optimality in terms of trend parameter estimation and D-optimality in terms of covariance parameter estimation yield basically different designs. The Pareto frontier of these two competing determinant criteria corresponds with designs that perform well under both criteria. Under certain conditions searching the G-optimal design on the above Pareto frontier yields almost as good results as searching the G-optimal design in the whole design region. In doing so the maximum of the empirical kriging variance has to be computed only a few times though. The method is demonstrated by means of a computer simulation experiment based on data provided by the Belgian institute Management Unit of the North Sea Mathematical Models (MUMM) that describe the evolution of inorganic and organic carbon and nutrients, phytoplankton, bacteria and zooplankton in the Southern Bight of the North Sea.
Consistent Small-Sample Variances for Six Gamma-Family Measures of Ordinal Association
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Gamma-family measures are bivariate ordinal correlation measures that form a family because they all reduce to Goodman and Kruskal's gamma in the absence of ties (1954). For several gamma-family indices, more than one variance estimator has been introduced. In previous research, the "consistent" variance estimator described by Cliff and…
ERIC Educational Resources Information Center
Lucas, Richard E.; Donnellan, M. Brent
2012-01-01
Life satisfaction is often assessed using single-item measures. However, estimating the reliability of these measures can be difficult because internal consistency coefficients cannot be calculated. Existing approaches use longitudinal data to isolate occasion-specific variance from variance that is either completely stable or variance that…
Estimation of Variance in the Case of Complex Samples.
ERIC Educational Resources Information Center
Groenewald, A. C.; Stoker, D. J.
In a complex sampling scheme it is desirable to select the primary sampling units (PSUs) without replacement to prevent duplications in the sample. Since the estimation of the sampling variances is more complicated when the PSUs are selected without replacement, L. Kish (1965) recommends that the variance be calculated using the formulas…
Genetic basis of between-individual and within-individual variance of docility.
Martin, J G A; Pirotta, E; Petelle, M B; Blumstein, D T
2017-04-01
Between-individual variation in phenotypes within a population is the basis of evolution. However, evolutionary and behavioural ecologists have mainly focused on estimating between-individual variance in mean trait and neglected variation in within-individual variance, or predictability of a trait. In fact, an important assumption of mixed-effects models used to estimate between-individual variance in mean traits is that within-individual residual variance (predictability) is identical across individuals. Individual heterogeneity in the predictability of behaviours is a potentially important effect but rarely estimated and accounted for. We used 11 389 measures of docility behaviour from 1576 yellow-bellied marmots (Marmota flaviventris) to estimate between-individual variation in both mean docility and its predictability. We then implemented a double hierarchical animal model to decompose the variances of both mean trait and predictability into their environmental and genetic components. We found that individuals differed both in their docility and in their predictability of docility with a negative phenotypic covariance. We also found significant genetic variance for both mean docility and its predictability but no genetic covariance between the two. This analysis is one of the first to estimate the genetic basis of both mean trait and within-individual variance in a wild population. Our results indicate that equal within-individual variance should not be assumed. We demonstrate the evolutionary importance of the variation in the predictability of docility and illustrate potential bias in models ignoring variation in predictability. We conclude that the variability in the predictability of a trait should not be ignored, and present a coherent approach for its quantification. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Comparison of Brownian-dynamics-based estimates of polymer tension with direct force measurements.
Arsenault, Mark E; Purohit, Prashant K; Goldman, Yale E; Shuman, Henry; Bau, Haim H
2010-11-01
With the aid of brownian dynamics models, it is possible to estimate polymer tension by monitoring polymers' transverse thermal fluctuations. To assess the precision of the approach, brownian dynamics-based tension estimates were compared with the force applied to rhodamine-phalloidin labeled actin filaments bound to polymer beads and suspended between two optical traps. The transverse thermal fluctuations of each filament were monitored with a CCD camera, and the images were analyzed to obtain the filament's transverse displacement variance as a function of position along the filament, the filament's tension, and the camera's exposure time. A linear Brownian dynamics model was used to estimate the filament's tension. The estimated force was compared and agreed within 30% (when the tension <0.1 pN ) and 70% (when the tension <1 pN ) with the applied trap force. In addition, the paper presents concise asymptotic expressions for the mechanical compliance of a system consisting of a filament attached tangentially to bead handles (dumbbell system). The techniques described here can be used for noncontact estimates of polymers' and fibers' tension.
Influence of particle composition on thorium cycling along the U.S. GEOTRACES North Atlantic Section
NASA Astrophysics Data System (ADS)
Lerner, Paul; Marchal, Olivier; Lam, Phoebe
2017-04-01
Our current knowledge about the behaviour of particle-reactive substances in the ocean stems largely from measurements of thorium radio-isotopes (Th-228, Th-230, Th-234) on seawater samples. The oceanic Th database has increased dramatically over the recent years, thanks in particular to the GEOTRACES program, an international study of the marine biogeochemical cycles of trace elements and their isotopes. Here we present an analysis of data collected at several stations of the U.S. GEOTRACES North Atlantic section (section GA03). Data originating from eleven stations situated west and east of the Middle-Atlantic Ridge are analyzed. First, at each station, the rate parameters of a single-particle class model of Th and particle cycling in the ocean water column are estimated from a least-squares fit to an eclectic data set, including (i) measurements of Th-228, Th-230, Th-234 activities in different size fractions, (ii) measurements of particle concentration, and (iii) measurements, or observational estimates, of the activities of the radio-active parents Ra-228, U-234, and U-238. Among our most salient results is a significant decrease in the apparent rate constant of Th adsorption (k1) with depth, with maxima in the meso-pelagic zone (ca. 100 - 1000 m) and minima below, at most stations. Second, we explore whether our k1 estimates can be related to changes in particle composition, both along the water column and laterally along GA03. We apply (i) multiple linear regression to quantify the amount of variance in k1 that can be explained by linear regression against particle composition data, and (ii) relative importance analysis to determine the relative contribution of different particulate phases to the explained variance in k1. Finally, the implications of our results for the interpretation of field Th isotope data and for the description of particle scavenging in ocean-biogeochemistry models are clarified.
Population-based absolute risk estimation with survey data
Kovalchik, Stephanie A.; Pfeiffer, Ruth M.
2013-01-01
Absolute risk is the probability that a cause-specific event occurs in a given time interval in the presence of competing events. We present methods to estimate population-based absolute risk from a complex survey cohort that can accommodate multiple exposure-specific competing risks. The hazard function for each event type consists of an individualized relative risk multiplied by a baseline hazard function, which is modeled nonparametrically or parametrically with a piecewise exponential model. An influence method is used to derive a Taylor-linearized variance estimate for the absolute risk estimates. We introduce novel measures of the cause-specific influences that can guide modeling choices for the competing event components of the model. To illustrate our methodology, we build and validate cause-specific absolute risk models for cardiovascular and cancer deaths using data from the National Health and Nutrition Examination Survey. Our applications demonstrate the usefulness of survey-based risk prediction models for predicting health outcomes and quantifying the potential impact of disease prevention programs at the population level. PMID:23686614
Roux, C Z
2009-05-01
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.
Secure Fusion Estimation for Bandwidth Constrained Cyber-Physical Systems Under Replay Attacks.
Chen, Bo; Ho, Daniel W C; Hu, Guoqiang; Yu, Li; Bo Chen; Ho, Daniel W C; Guoqiang Hu; Li Yu; Chen, Bo; Ho, Daniel W C; Hu, Guoqiang; Yu, Li
2018-06-01
State estimation plays an essential role in the monitoring and supervision of cyber-physical systems (CPSs), and its importance has made the security and estimation performance a major concern. In this case, multisensor information fusion estimation (MIFE) provides an attractive alternative to study secure estimation problems because MIFE can potentially improve estimation accuracy and enhance reliability and robustness against attacks. From the perspective of the defender, the secure distributed Kalman fusion estimation problem is investigated in this paper for a class of CPSs under replay attacks, where each local estimate obtained by the sink node is transmitted to a remote fusion center through bandwidth constrained communication channels. A new mathematical model with compensation strategy is proposed to characterize the replay attacks and bandwidth constrains, and then a recursive distributed Kalman fusion estimator (DKFE) is designed in the linear minimum variance sense. According to different communication frameworks, two classes of data compression and compensation algorithms are developed such that the DKFEs can achieve the desired performance. Several attack-dependent and bandwidth-dependent conditions are derived such that the DKFEs are secure under replay attacks. An illustrative example is given to demonstrate the effectiveness of the proposed methods.
Nonparametric estimation of plant density by the distance method
Patil, S.A.; Burnham, K.P.; Kovner, J.L.
1979-01-01
A relation between the plant density and the probability density function of the nearest neighbor distance (squared) from a random point is established under fairly broad conditions. Based upon this relationship, a nonparametric estimator for the plant density is developed and presented in terms of order statistics. Consistency and asymptotic normality of the estimator are discussed. An interval estimator for the density is obtained. The modifications of this estimator and its variance are given when the distribution is truncated. Simulation results are presented for regular, random and aggregated populations to illustrate the nonparametric estimator and its variance. A numerical example from field data is given. Merits and deficiencies of the estimator are discussed with regard to its robustness and variance.
Reducing the number of reconstructions needed for estimating channelized observer performance
NASA Astrophysics Data System (ADS)
Pineda, Angel R.; Miedema, Hope; Brenner, Melissa; Altaf, Sana
2018-03-01
A challenge for task-based optimization is the time required for each reconstructed image in applications where reconstructions are time consuming. Our goal is to reduce the number of reconstructions needed to estimate the area under the receiver operating characteristic curve (AUC) of the infinitely-trained optimal channelized linear observer. We explore the use of classifiers which either do not invert the channel covariance matrix or do feature selection. We also study the assumption that multiple low contrast signals in the same image of a non-linear reconstruction do not significantly change the estimate of the AUC. We compared the AUC of several classifiers (Hotelling, logistic regression, logistic regression using Firth bias reduction and the least absolute shrinkage and selection operator (LASSO)) with a small number of observations both for normal simulated data and images from a total variation reconstruction in magnetic resonance imaging (MRI). We used 10 Laguerre-Gauss channels and the Mann-Whitney estimator for AUC. For this data, our results show that at small sample sizes feature selection using the LASSO technique can decrease bias of the AUC estimation with increased variance and that for large sample sizes the difference between these classifiers is small. We also compared the use of multiple signals in a single reconstructed image to reduce the number of reconstructions in a total variation reconstruction for accelerated imaging in MRI. We found that AUC estimation using multiple low contrast signals in the same image resulted in similar AUC estimates as doing a single reconstruction per signal leading to a 13x reduction in the number of reconstructions needed.
Empirical single sample quantification of bias and variance in Q-ball imaging.
Hainline, Allison E; Nath, Vishwesh; Parvathaneni, Prasanna; Blaber, Justin A; Schilling, Kurt G; Anderson, Adam W; Kang, Hakmook; Landman, Bennett A
2018-02-06
The bias and variance of high angular resolution diffusion imaging methods have not been thoroughly explored in the literature and may benefit from the simulation extrapolation (SIMEX) and bootstrap techniques to estimate bias and variance of high angular resolution diffusion imaging metrics. The SIMEX approach is well established in the statistics literature and uses simulation of increasingly noisy data to extrapolate back to a hypothetical case with no noise. The bias of calculated metrics can then be computed by subtracting the SIMEX estimate from the original pointwise measurement. The SIMEX technique has been studied in the context of diffusion imaging to accurately capture the bias in fractional anisotropy measurements in DTI. Herein, we extend the application of SIMEX and bootstrap approaches to characterize bias and variance in metrics obtained from a Q-ball imaging reconstruction of high angular resolution diffusion imaging data. The results demonstrate that SIMEX and bootstrap approaches provide consistent estimates of the bias and variance of generalized fractional anisotropy, respectively. The RMSE for the generalized fractional anisotropy estimates shows a 7% decrease in white matter and an 8% decrease in gray matter when compared with the observed generalized fractional anisotropy estimates. On average, the bootstrap technique results in SD estimates that are approximately 97% of the true variation in white matter, and 86% in gray matter. Both SIMEX and bootstrap methods are flexible, estimate population characteristics based on single scans, and may be extended for bias and variance estimation on a variety of high angular resolution diffusion imaging metrics. © 2018 International Society for Magnetic Resonance in Medicine.
Combining Study Outcome Measures Using Dominance Adjusted Weights
ERIC Educational Resources Information Center
Makambi, Kepher H.; Lu, Wenxin
2013-01-01
Weighting of studies in meta-analysis is usually implemented by using the estimated inverse variances of treatment effect estimates. However, there is a possibility of one study dominating other studies in the estimation process by taking on a weight that is above some upper limit. We implement an estimator of the heterogeneity variance that takes…
Rasul, Golam; Glover, Karl D; Krishnan, Padmanaban G; Wu, Jixiang; Berzonsky, William A; Ibrahim, Amir M H
2015-12-01
Elevated level of late maturity α-amylase activity (LMAA) can result in low falling number scores, reduced grain quality, and downgrade of wheat (Triticum aestivum L.) class. A mating population was developed by crossing parents with different levels of LMAA. The F2 and F3 hybrids and their parents were evaluated for LMAA, and data were analyzed using the R software package 'qgtools' integrated with an additive-dominance genetic model and a mixed linear model approach. Simulated results showed high testing powers for additive and additive × environment variances, and comparatively low powers for dominance and dominance × environment variances. All variance components and their proportions to the phenotypic variance for the parents and hybrids were significant except for the dominance × environment variance. The estimated narrow-sense heritability and broad-sense heritability for LMAA were 14 and 54%, respectively. High significant negative additive effects for parents suggest that spring wheat cultivars 'Lancer' and 'Chester' can serve as good general combiners, and that 'Kinsman' and 'Seri-82' had negative specific combining ability in some hybrids despite of their own significant positive additive effects, suggesting they can be used as parents to reduce LMAA levels. Seri-82 showed very good general combining ability effect when used as a male parent, indicating the importance of reciprocal effects. High significant negative dominance effects and high-parent heterosis for hybrids demonstrated that the specific hybrid combinations; Chester × Kinsman, 'Lerma52' × Lancer, Lerma52 × 'LoSprout' and 'Janz' × Seri-82 could be generated to produce cultivars with significantly reduced LMAA level.
Junttila, Virpi; Kauranne, Tuomo; Finley, Andrew O.; Bradford, John B.
2015-01-01
Modern operational forest inventory often uses remotely sensed data that cover the whole inventory area to produce spatially explicit estimates of forest properties through statistical models. The data obtained by airborne light detection and ranging (LiDAR) correlate well with many forest inventory variables, such as the tree height, the timber volume, and the biomass. To construct an accurate model over thousands of hectares, LiDAR data must be supplemented with several hundred field sample measurements of forest inventory variables. This can be costly and time consuming. Different LiDAR-data-based and spatial-data-based sampling designs can reduce the number of field sample plots needed. However, problems arising from the features of the LiDAR data, such as a large number of predictors compared with the sample size (overfitting) or a strong correlation among predictors (multicollinearity), may decrease the accuracy and precision of the estimates and predictions. To overcome these problems, a Bayesian linear model with the singular value decomposition of predictors, combined with regularization, is proposed. The model performance in predicting different forest inventory variables is verified in ten inventory areas from two continents, where the number of field sample plots is reduced using different sampling designs. The results show that, with an appropriate field plot selection strategy and the proposed linear model, the total relative error of the predicted forest inventory variables is only 5%–15% larger using 50 field sample plots than the error of a linear model estimated with several hundred field sample plots when we sum up the error due to both the model noise variance and the model’s lack of fit.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Maltz, Jonathan S.
2000-11-01
We present an algorithm of reduced computational cost which is able to estimate kinetic model parameters directly from dynamic ECT sinograms made up of temporally inconsistent projections. The algorithm exploits the extreme degree of parameter redundancy inherent in linear combinations of the exponential functions which represent the modes of first-order compartmental systems. The singular value decomposition is employed to find a small set of orthogonal functions, the linear combinations of which are able to accurately represent all modes within the physiologically anticipated range in a given study. The reduced-dimension basis is formed as the convolution of this orthogonal set with a measured input function. The Moore-Penrose pseudoinverse is used to find coefficients of this basis. Algorithm performance is evaluated at realistic count rates using MCAT phantom and clinical 99mTc-teboroxime myocardial study data. Phantom data are modelled as originating from a Poisson process. For estimates recovered from a single slice projection set containing 2.5×105 total counts, recovered tissue responses compare favourably with those obtained using more computationally intensive methods. The corresponding kinetic parameter estimates (coefficients of the new basis) exhibit negligible bias, while parameter variances are low, falling within 30% of the Cramér-Rao lower bound.
Control algorithms for dynamic attenuators.
Hsieh, Scott S; Pelc, Norbert J
2014-06-01
The authors describe algorithms to control dynamic attenuators in CT and compare their performance using simulated scans. Dynamic attenuators are prepatient beam shaping filters that modulate the distribution of x-ray fluence incident on the patient on a view-by-view basis. These attenuators can reduce dose while improving key image quality metrics such as peak or mean variance. In each view, the attenuator presents several degrees of freedom which may be individually adjusted. The total number of degrees of freedom across all views is very large, making many optimization techniques impractical. The authors develop a theory for optimally controlling these attenuators. Special attention is paid to a theoretically perfect attenuator which controls the fluence for each ray individually, but the authors also investigate and compare three other, practical attenuator designs which have been previously proposed: the piecewise-linear attenuator, the translating attenuator, and the double wedge attenuator. The authors pose and solve the optimization problems of minimizing the mean and peak variance subject to a fixed dose limit. For a perfect attenuator and mean variance minimization, this problem can be solved in simple, closed form. For other attenuator designs, the problem can be decomposed into separate problems for each view to greatly reduce the computational complexity. Peak variance minimization can be approximately solved using iterated, weighted mean variance (WMV) minimization. Also, the authors develop heuristics for the perfect and piecewise-linear attenuators which do not require a priori knowledge of the patient anatomy. The authors compare these control algorithms on different types of dynamic attenuators using simulated raw data from forward projected DICOM files of a thorax and an abdomen. The translating and double wedge attenuators reduce dose by an average of 30% relative to current techniques (bowtie filter with tube current modulation) without increasing peak variance. The 15-element piecewise-linear dynamic attenuator reduces dose by an average of 42%, and the perfect attenuator reduces dose by an average of 50%. Improvements in peak variance are several times larger than improvements in mean variance. Heuristic control eliminates the need for a prescan. For the piecewise-linear attenuator, the cost of heuristic control is an increase in dose of 9%. The proposed iterated WMV minimization produces results that are within a few percent of the true solution. Dynamic attenuators show potential for significant dose reduction. A wide class of dynamic attenuators can be accurately controlled using the described methods.
Characterizing the performance of the Conway-Maxwell Poisson generalized linear model.
Francis, Royce A; Geedipally, Srinivas Reddy; Guikema, Seth D; Dhavala, Soma Sekhar; Lord, Dominique; LaRocca, Sarah
2012-01-01
Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression. © 2011 Society for Risk Analysis.
Jiang, Jiang; DeAngelis, Donald L.; Zhang, B.; Cohen, J.E.
2014-01-01
Taylor's power law describes an empirical relationship between the mean and variance of population densities in field data, in which the variance varies as a power, b, of the mean. Most studies report values of b varying between 1 and 2. However, Cohen (2014a) showed recently that smooth changes in environmental conditions in a model can lead to an abrupt, infinite change in b. To understand what factors can influence the occurrence of an abrupt change in b, we used both mathematical analysis and Monte Carlo samples from a model in which populations of the same species settled on patches, and each population followed independently a stochastic linear birth-and-death process. We investigated how the power relationship responds to a smooth change of population growth rate, under different sampling strategies, initial population density, and population age. We showed analytically that, if the initial populations differ only in density, and samples are taken from all patches after the same time period following a major invasion event, Taylor's law holds with exponent b=1, regardless of the population growth rate. If samples are taken at different times from patches that have the same initial population densities, we calculate an abrupt shift of b, as predicted by Cohen (2014a). The loss of linearity between log variance and log mean is a leading indicator of the abrupt shift. If both initial population densities and population ages vary among patches, estimates of b lie between 1 and 2, as in most empirical studies. But the value of b declines to ~1 as the system approaches a critical point. Our results can inform empirical studies that might be designed to demonstrate an abrupt shift in Taylor's law.
Procedures for estimating confidence intervals for selected method performance parameters.
McClure, F D; Lee, J K
2001-01-01
Procedures for estimating confidence intervals (CIs) for the repeatability variance (sigmar2), reproducibility variance (sigmaR2 = sigmaL2 + sigmar2), laboratory component (sigmaL2), and their corresponding standard deviations sigmar, sigmaR, and sigmaL, respectively, are presented. In addition, CIs for the ratio of the repeatability component to the reproducibility variance (sigmar2/sigmaR2) and the ratio of the laboratory component to the reproducibility variance (sigmaL2/sigmaR2) are also presented.
Saviane, Chiara; Silver, R Angus
2006-06-15
Synapses play a crucial role in information processing in the brain. Amplitude fluctuations of synaptic responses can be used to extract information about the mechanisms underlying synaptic transmission and its modulation. In particular, multiple-probability fluctuation analysis can be used to estimate the number of functional release sites, the mean probability of release and the amplitude of the mean quantal response from fits of the relationship between the variance and mean amplitude of postsynaptic responses, recorded at different probabilities. To determine these quantal parameters, calculate their uncertainties and the goodness-of-fit of the model, it is important to weight the contribution of each data point in the fitting procedure. We therefore investigated the errors associated with measuring the variance by determining the best estimators of the variance of the variance and have used simulations of synaptic transmission to test their accuracy and reliability under different experimental conditions. For central synapses, which generally have a low number of release sites, the amplitude distribution of synaptic responses is not normal, thus the use of a theoretical variance of the variance based on the normal assumption is not a good approximation. However, appropriate estimators can be derived for the population and for limited sample sizes using a more general expression that involves higher moments and introducing unbiased estimators based on the h-statistics. Our results are likely to be relevant for various applications of fluctuation analysis when few channels or release sites are present.
Kim, Minjung; Lamont, Andrea E; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M Lee
2016-06-01
Regression mixture models are a novel approach to modeling the heterogeneous effects of predictors on an outcome. In the model-building process, often residual variances are disregarded and simplifying assumptions are made without thorough examination of the consequences. In this simulation study, we investigated the impact of an equality constraint on the residual variances across latent classes. We examined the consequences of constraining the residual variances on class enumeration (finding the true number of latent classes) and on the parameter estimates, under a number of different simulation conditions meant to reflect the types of heterogeneity likely to exist in applied analyses. The results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted on the estimated class sizes and showed the potential to greatly affect the parameter estimates in each class. These results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions are made.
Garabedian, Stephen P.; LeBlanc, Dennis R.; Gelhar, Lynn W.; Celia, Michael A.
1991-01-01
A large-scale natural gradient tracer test was conducted to examine the transport of reactive and nonreactive tracers in a sand and gravel aquifer on Cape Cod, Massachusetts. As part of this test the transport of bromide, a nonreactive tracer, was monitored for about 280 m and quantified using spatial moments. The calculated mass of bromide for each sampling date varied between 85% and 105% of the injected mass using an estimated porosity of 0.39, and the center of mass moved at a nearly constant horizontal velocity of 0.42 m per day. A nonlinear change in the bromide longitudinal variance was observed during the first 26 m of travel distance, but afterward the variance followed a linear trend, indicating the longitudinal dispersivity had reached a constant value of 0.96 m. The transverse dispersivities were much smaller; transverse horizontal dispersivity was 1.8 cm, and transverse vertical dispersivity was about 1.5 mm.
NASA Astrophysics Data System (ADS)
Mayvan, Ali D.; Aghaeinia, Hassan; Kazemi, Mohammad
2017-12-01
This paper focuses on robust transceiver design for throughput enhancement on the interference channel (IC), under imperfect channel state information (CSI). In this paper, two algorithms are proposed to improve the throughput of the multi-input multi-output (MIMO) IC. Each transmitter and receiver has, respectively, M and N antennas and IC operates in a time division duplex mode. In the first proposed algorithm, each transceiver adjusts its filter to maximize the expected value of signal-to-interference-plus-noise ratio (SINR). On the other hand, the second algorithm tries to minimize the variances of the SINRs to hedge against the variability due to CSI error. Taylor expansion is exploited to approximate the effect of CSI imperfection on mean and variance. The proposed robust algorithms utilize the reciprocity of wireless networks to optimize the estimated statistical properties in two different working modes. Monte Carlo simulations are employed to investigate sum rate performance of the proposed algorithms and the advantage of incorporating variation minimization into the transceiver design.
ERIC Educational Resources Information Center
Jackson, Dan; Bowden, Jack; Baker, Rose
2015-01-01
Moment-based estimators of the between-study variance are very popular when performing random effects meta-analyses. This type of estimation has many advantages including computational and conceptual simplicity. Furthermore, by using these estimators in large samples, valid meta-analyses can be performed without the assumption that the treatment…
Malek, Lenka; Umberger, Wendy J; Makrides, Maria; ShaoJia, Zhou
2017-09-01
This study aims to aid in the development of more effective healthy eating intervention strategies for pregnant women by understanding the relationship between healthy eating intention and actual eating behaviour. Specifically, the study explored whether Theory of Planned Behaviour (TPB) constructs [attitude, subjective-norm, perceived-behavioural-control (PBC)] and additional psychosocial variables (perceived stress, health value and self-identity as a healthy eater) are useful in explaining variance in women's 1) intentions to consume a healthy diet during pregnancy and 2) food consumption behaviour (e.g. adherence to food group recommendations) during pregnancy. A cross-sectional sample of 455 Australian pregnant women completed a TPB questionnaire as part of a larger comprehensive web-based nutrition questionnaire. Women's perceived stress, health value and self-identity as a healthy eater were also measured. Dietary intake was assessed using six-items based on the 2013 Australian Dietary Guidelines. Hierarchical multiple linear regression models were estimated (significance level <0.05), which explained 70% of the variance in healthy eating intention scores and 12% of the variance in adherence to food group recommendations. TPB constructs explained 66% of the total variance in healthy eating intention. Significant predictors of stronger healthy eating intention were greater PBC and subjective norm, followed by positive attitude and stronger self-identity as a healthy eater. Conversely, TPB constructs collectively explained only 3.4% of total variance in adherence to food group recommendations. These findings reveal that the TPB framework explains considerable variance in healthy eating intention during pregnancy, but explains little variance in actual food consumption behaviour. Further research is required to understand this weak relationship between healthy eating intention and behaviour during pregnancy. Alternative behavioural frameworks, particularly those that account for the automatic nature of most dietary choices, should also be considered. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluation of three lidar scanning strategies for turbulence measurements
NASA Astrophysics Data System (ADS)
Newman, J. F.; Klein, P. M.; Wharton, S.; Sathe, A.; Bonin, T. A.; Chilson, P. B.; Muschinski, A.
2015-11-01
Several errors occur when a traditional Doppler-beam swinging (DBS) or velocity-azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers. Results indicate that the six-beam strategy mitigates some of the errors caused by VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.
Evaluation of three lidar scanning strategies for turbulence measurements
NASA Astrophysics Data System (ADS)
Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia; Sathe, Ameya; Bonin, Timothy A.; Chilson, Phillip B.; Muschinski, Andreas
2016-05-01
Several errors occur when a traditional Doppler beam swinging (DBS) or velocity-azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused by VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.
Human Facial Shape and Size Heritability and Genetic Correlations.
Cole, Joanne B; Manyama, Mange; Larson, Jacinda R; Liberton, Denise K; Ferrara, Tracey M; Riccardi, Sheri L; Li, Mao; Mio, Washington; Klein, Ophir D; Santorico, Stephanie A; Hallgrímsson, Benedikt; Spritz, Richard A
2017-02-01
The human face is an array of variable physical features that together make each of us unique and distinguishable. Striking familial facial similarities underscore a genetic component, but little is known of the genes that underlie facial shape differences. Numerous studies have estimated facial shape heritability using various methods. Here, we used advanced three-dimensional imaging technology and quantitative human genetics analysis to estimate narrow-sense heritability, heritability explained by common genetic variation, and pairwise genetic correlations of 38 measures of facial shape and size in normal African Bantu children from Tanzania. Specifically, we fit a linear mixed model of genetic relatedness between close and distant relatives to jointly estimate variance components that correspond to heritability explained by genome-wide common genetic variation and variance explained by uncaptured genetic variation, the sum representing total narrow-sense heritability. Our significant estimates for narrow-sense heritability of specific facial traits range from 28 to 67%, with horizontal measures being slightly more heritable than vertical or depth measures. Furthermore, for over half of facial traits, >90% of narrow-sense heritability can be explained by common genetic variation. We also find high absolute genetic correlation between most traits, indicating large overlap in underlying genetic loci. Not surprisingly, traits measured in the same physical orientation (i.e., both horizontal or both vertical) have high positive genetic correlations, whereas traits in opposite orientations have high negative correlations. The complex genetic architecture of facial shape informs our understanding of the intricate relationships among different facial features as well as overall facial development. Copyright © 2017 by the Genetics Society of America.
Bernard R. Parresol
1993-01-01
In the context of forest modeling, it is often reasonable to assume a multiplicative heteroscedastic error structure to the data. Under such circumstances ordinary least squares no longer provides minimum variance estimates of the model parameters. Through study of the error structure, a suitable error variance model can be specified and its parameters estimated. This...
Pearson correlation estimation for irregularly sampled time series
NASA Astrophysics Data System (ADS)
Rehfeld, K.; Marwan, N.; Heitzig, J.; Kurths, J.
2012-04-01
Many applications in the geosciences call for the joint and objective analysis of irregular time series. For automated processing, robust measures of linear and nonlinear association are needed. Up to now, the standard approach would have been to reconstruct the time series on a regular grid, using linear or spline interpolation. Interpolation, however, comes with systematic side-effects, as it increases the auto-correlation in the time series. We have searched for the best method to estimate Pearson correlation for irregular time series, i.e. the one with the lowest estimation bias and variance. We adapted a kernel-based approach, using Gaussian weights. Pearson correlation is calculated, in principle, as a mean over products of previously centralized observations. In the regularly sampled case, observations in both time series were observed at the same time and thus the allocation of measurement values into pairs of products is straightforward. In the irregularly sampled case, however, measurements were not necessarily observed at the same time. Now, the key idea of the kernel-based method is to calculate weighted means of products, with the weight depending on the time separation between the observations. If the lagged correlation function is desired, the weights depend on the absolute difference between observation time separation and the estimation lag. To assess the applicability of the approach we used extensive simulations to determine the extent of interpolation side-effects with increasing irregularity of time series. We compared different approaches, based on (linear) interpolation, the Lomb-Scargle Fourier Transform, the sinc kernel and the Gaussian kernel. We investigated the role of kernel bandwidth and signal-to-noise ratio in the simulations. We found that the Gaussian kernel approach offers significant advantages and low Root-Mean Square Errors for regular, slightly irregular and very irregular time series. We therefore conclude that it is a good (linear) similarity measure that is appropriate for irregular time series with skewed inter-sampling time distributions.
Croft, Stephen; Burr, Thomas Lee; Favalli, Andrea; ...
2015-12-10
We report that the declared linear density of 238U and 235U in fresh low enriched uranium light water reactor fuel assemblies can be verified for nuclear safeguards purposes using a neutron coincidence counter collar in passive and active mode, respectively. The active mode calibration of the Uranium Neutron Collar – Light water reactor fuel (UNCL) instrument is normally performed using a non-linear fitting technique. The fitting technique relates the measured neutron coincidence rate (the predictor) to the linear density of 235U (the response) in order to estimate model parameters of the nonlinear Padé equation, which traditionally is used to modelmore » the calibration data. Alternatively, following a simple data transformation, the fitting can also be performed using standard linear fitting methods. This paper compares performance of the nonlinear technique to the linear technique, using a range of possible error variance magnitudes in the measured neutron coincidence rate. We develop the required formalism and then apply the traditional (nonlinear) and alternative approaches (linear) to the same experimental and corresponding simulated representative datasets. Lastly, we find that, in this context, because of the magnitude of the errors in the predictor, it is preferable not to transform to a linear model, and it is preferable not to adjust for the errors in the predictor when inferring the model parameters« less
Joint Center Estimation Using Single-Frame Optimization: Part 1: Numerical Simulation.
Frick, Eric; Rahmatalla, Salam
2018-04-04
The biomechanical models used to refine and stabilize motion capture processes are almost invariably driven by joint center estimates, and any errors in joint center calculation carry over and can be compounded when calculating joint kinematics. Unfortunately, accurate determination of joint centers is a complex task, primarily due to measurements being contaminated by soft-tissue artifact (STA). This paper proposes a novel approach to joint center estimation implemented via sequential application of single-frame optimization (SFO). First, the method minimizes the variance of individual time frames’ joint center estimations via the developed variance minimization method to obtain accurate overall initial conditions. These initial conditions are used to stabilize an optimization-based linearization of human motion that determines a time-varying joint center estimation. In this manner, the complex and nonlinear behavior of human motion contaminated by STA can be captured as a continuous series of unique rigid-body realizations without requiring a complex analytical model to describe the behavior of STA. This article intends to offer proof of concept, and the presented method must be further developed before it can be reasonably applied to human motion. Numerical simulations were introduced to verify and substantiate the efficacy of the proposed methodology. When directly compared with a state-of-the-art inertial method, SFO reduced the error due to soft-tissue artifact in all cases by more than 45%. Instead of producing a single vector value to describe the joint center location during a motion capture trial as existing methods often do, the proposed method produced time-varying solutions that were highly correlated ( r > 0.82) with the true, time-varying joint center solution.
Aqil, Muhammad; Jeong, Myung Yung
2018-04-24
The robust characterization of real-time brain activity carries potential for many applications. However, the contamination of measured signals by various instrumental, environmental, and physiological sources of noise introduces a substantial amount of signal variance and, consequently, challenges real-time estimation of contributions from underlying neuronal sources. Functional near infra-red spectroscopy (fNIRS) is an emerging imaging modality whose real-time potential is yet to be fully explored. The objectives of the current study are to (i) validate a time-dependent linear model of hemodynamic responses in fNIRS, and (ii) test the robustness of this approach against measurement noise (instrumental and physiological) and mis-specification of the hemodynamic response basis functions (amplitude, latency, and duration). We propose a linear hemodynamic model with time-varying parameters, which are estimated (adapted and tracked) using a dynamic recursive least square algorithm. Owing to the linear nature of the activation model, the problem of achieving robust convergence to an accurate estimation of the model parameters is recast as a problem of parameter error stability around the origin. We show that robust convergence of the proposed method is guaranteed in the presence of an acceptable degree of model misspecification and we derive an upper bound on noise under which reliable parameters can still be inferred. We also derived a lower bound on signal-to-noise-ratio over which the reliable parameters can still be inferred from a channel/voxel. Whilst here applied to fNIRS, the proposed methodology is applicable to other hemodynamic-based imaging technologies such as functional magnetic resonance imaging. Copyright © 2018 Elsevier Inc. All rights reserved.
van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V
2017-03-21
Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Thoonsaengngam, Rattapol; Tangsangiumvisai, Nisachon
This paper proposes an enhanced method for estimating the a priori Signal-to-Disturbance Ratio (SDR) to be employed in the Acoustic Echo and Noise Suppression (AENS) system for full-duplex hands-free communications. The proposed a priori SDR estimation technique is modified based upon the Two-Step Noise Reduction (TSNR) algorithm to suppress the background noise while preserving speech spectral components. In addition, a practical approach to determine accurately the Echo Spectrum Variance (ESV) is presented based upon the linear relationship assumption between the power spectrum of far-end speech and acoustic echo signals. The ESV estimation technique is then employed to alleviate the acoustic echo problem. The performance of the AENS system that employs these two proposed estimation techniques is evaluated through the Echo Attenuation (EA), Noise Attenuation (NA), and two speech distortion measures. Simulation results based upon real speech signals guarantee that our improved AENS system is able to mitigate efficiently the problem of acoustic echo and background noise, while preserving the speech quality and speech intelligibility.
Extended Kalman filtering for the detection of damage in linear mechanical structures
NASA Astrophysics Data System (ADS)
Liu, X.; Escamilla-Ambrosio, P. J.; Lieven, N. A. J.
2009-09-01
This paper addresses the problem of assessing the location and extent of damage in a vibrating structure by means of vibration measurements. Frequency domain identification methods (e.g. finite element model updating) have been widely used in this area while time domain methods such as the extended Kalman filter (EKF) method, are more sparsely represented. The difficulty of applying EKF in mechanical system damage identification and localisation lies in: the high computational cost, the dependence of estimation results on the initial estimation error covariance matrix P(0), the initial value of parameters to be estimated, and on the statistics of measurement noise R and process noise Q. To resolve these problems in the EKF, a multiple model adaptive estimator consisting of a bank of EKF in modal domain was designed, each filter in the bank is based on different P(0). The algorithm was iterated by using the weighted global iteration method. A fuzzy logic model was incorporated in each filter to estimate the variance of the measurement noise R. The application of the method is illustrated by simulated and real examples.
SU-F-207-16: CT Protocols Optimization Using Model Observer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tseng, H; Fan, J; Kupinski, M
2015-06-15
Purpose: To quantitatively evaluate the performance of different CT protocols using task-based measures of image quality. This work studies the task of size and the contrast estimation of different iodine concentration rods inserted in head- and body-sized phantoms using different imaging protocols. These protocols are designed to have the same dose level (CTDIvol) but using different X-ray tube voltage settings (kVp). Methods: Different concentrations of iodine objects inserted in a head size phantom and a body size phantom are imaged on a 64-slice commercial CT scanner. Scanning protocols with various tube voltages (80, 100, and 120 kVp) and current settingsmore » are selected, which output the same absorbed dose level (CTDIvol). Because the phantom design (size of the iodine objects, the air gap between the inserted objects and the phantom) is not ideal for a model observer study, the acquired CT images are used to generate simulation images with four different sizes and five different contracts iodine objects. For each type of the objects, 500 images (100 x 100 pixels) are generated for the observer study. The observer selected in this study is the channelized scanning linear observer which could be applied to estimate the size and the contrast. The figure of merit used is the correct estimation ratio. The mean and the variance are estimated by the shuffle method. Results: The results indicate that the protocols with 100 kVp tube voltage setting provides the best performance for iodine insert size and contrast estimation for both head and body phantom cases. Conclusion: This work presents a practical and robust quantitative approach using channelized scanning linear observer to study contrast and size estimation performance from different CT protocols. Different protocols at same CTDIvol setting could Result in different image quality performance. The relationship between the absorbed dose and the diagnostic image quality is not linear.« less
How many days of accelerometer monitoring predict weekly physical activity behaviour in obese youth?
Vanhelst, Jérémy; Fardy, Paul S; Duhamel, Alain; Béghin, Laurent
2014-09-01
The aim of this study was to determine the type and the number of accelerometer monitoring days needed to predict weekly sedentary behaviour and physical activity in obese youth. Fifty-three obese youth wore a triaxial accelerometer for 7 days to measure physical activity in free-living conditions. Analyses of variance for repeated measures, Intraclass coefficient (ICC) and regression linear analyses were used. Obese youth spent significantly less time in physical activity on weekends or free days compared with school days. ICC analyses indicated a minimum of 2 days is needed to estimate physical activity behaviour. ICC were 0·80 between weekly physical activity and weekdays and 0·92 between physical activity and weekend days. The model has to include a weekday and a weekend day. Using any combination of one weekday and one weekend day, the percentage of variance explained is >90%. Results indicate that 2 days of monitoring are needed to estimate the weekly physical activity behaviour in obese youth with an accelerometer. Our results also showed the importance of taking into consideration school day versus free day and weekday versus weekend day in assessing physical activity in obese youth. © 2013 Scandinavian Society of Clinical Physiology and Nuclear Medicine. Published by John Wiley & Sons Ltd.
Influence of wind and river discharge on the vertical exchange process in the Pearl River Estuary
NASA Astrophysics Data System (ADS)
Hong, B.; Peng, S.
2016-02-01
Vertical exchange process is controlled by the buoyancy input from river discharge and the momentum input by wind forcing. This study investigates the vertical exchange process in the Pearl River Estuary by using a 3-D numerical model. The vertical exchange time (VET) is used to quantify the magnitude of vertical exchange process in response to changing local wind and river discharge. During the dry season, it only takes about 2 days for the surface layer water mass being transported to the bottom layer. During the wet season, such transport will take more than 20 days in a large portion of the main channel. The water in the slope area can be well ventilated. Linear regression of VET indicates that water column stratification can be used to estimate the VET and up to 71% of the variance can be accounted. The estimation by using river runoff can only account for about 49% of the variance. The effects of wind speed and direction are investigated separately. Neither river runoff nor the stratification can properly predict the VET during the typical wet season. Further investigations are needed to reveal the dynamics of vertical exchange process and find out other factors that influence the VET during the wet season.
A perspective on interaction effects in genetic association studies
2016-01-01
ABSTRACT The identification of gene–gene and gene–environment interaction in human traits and diseases is an active area of research that generates high expectation, and most often lead to high disappointment. This is partly explained by a misunderstanding of the inherent characteristics of standard regression‐based interaction analyses. Here, I revisit and untangle major theoretical aspects of interaction tests in the special case of linear regression; in particular, I discuss variables coding scheme, interpretation of effect estimate, statistical power, and estimation of variance explained in regard of various hypothetical interaction patterns. Linking this components it appears first that the simplest biological interaction models—in which the magnitude of a genetic effect depends on a common exposure—are among the most difficult to identify. Second, I highlight the demerit of the current strategy to evaluate the contribution of interaction effects to the variance of quantitative outcomes and argue for the use of new approaches to overcome this issue. Finally, I explore the advantages and limitations of multivariate interaction models, when testing for interaction between multiple SNPs and/or multiple exposures, over univariate approaches. Together, these new insights can be leveraged for future method development and to improve our understanding of the genetic architecture of multifactorial traits. PMID:27390122
Linear error analysis of slope-area discharge determinations
Kirby, W.H.
1987-01-01
The slope-area method can be used to calculate peak flood discharges when current-meter measurements are not possible. This calculation depends on several quantities, such as water-surface fall, that are subject to large measurement errors. Other critical quantities, such as Manning's n, are not even amenable to direct measurement but can only be estimated. Finally, scour and fill may cause gross discrepancies between the observed condition of the channel and the hydraulic conditions during the flood peak. The effects of these potential errors on the accuracy of the computed discharge have been estimated by statistical error analysis using a Taylor-series approximation of the discharge formula and the well-known formula for the variance of a sum of correlated random variates. The resultant error variance of the computed discharge is a weighted sum of covariances of the various observational errors. The weights depend on the hydraulic and geometric configuration of the channel. The mathematical analysis confirms the rule of thumb that relative errors in computed discharge increase rapidly when velocity heads exceed the water-surface fall, when the flow field is expanding and when lateral velocity variation (alpha) is large. It also confirms the extreme importance of accurately assessing the presence of scour or fill. ?? 1987.
Gianola, Daniel; Fariello, Maria I.; Naya, Hugo; Schön, Chris-Carolin
2016-01-01
Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals (G) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G, provided variance components are unaffected by exclusion of such marker(s) from G. The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G does matter. Removal of eigenvectors from G can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions. PMID:27520956
Unbiased Estimates of Variance Components with Bootstrap Procedures
ERIC Educational Resources Information Center
Brennan, Robert L.
2007-01-01
This article provides general procedures for obtaining unbiased estimates of variance components for any random-model balanced design under any bootstrap sampling plan, with the focus on designs of the type typically used in generalizability theory. The results reported here are particularly helpful when the bootstrap is used to estimate standard…
Control Variate Estimators of Survivor Growth from Point Samples
Francis A. Roesch; Paul C. van Deusen
1993-01-01
Two estimators of the control variate type for survivor growth from remeasured point samples are proposed and compared with more familiar estimators. The large reductionsin variance, observed in many cases forestimators constructed with control variates, arealso realized in thisapplication. A simulation study yielded consistent reductions in variance which were often...
NASA Astrophysics Data System (ADS)
Caimmi, R.
2011-08-01
Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both heteroscedastic and homoscedastic data. Conversely, samples related to different methods produce discrepant results, due to the presence of (still undetected) systematic errors, which implies no definitive statement can be made at present. A comparison is also made between different expressions of regression line slope and intercept variance estimators, where fractional discrepancies are found to be not exceeding a few percent, which grows up to about 20% in the presence of large dispersion data. An extension of the formalism to structural models is left to a forthcoming paper.
Gebreyesus, Grum; Lund, Mogens S; Buitenhuis, Bart; Bovenhuis, Henk; Poulsen, Nina A; Janss, Luc G
2017-12-05
Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls. Single-nucleotide polymorphisms (SNPs), from 50K SNP arrays, were grouped into non-overlapping genome segments. A segment was defined as one SNP, or a group of 50, 100, or 200 adjacent SNPs, or one chromosome, or the whole genome. Traditional univariate and bivariate genomic best linear unbiased prediction (GBLUP) models were also run for comparison. Reliabilities were calculated through a resampling strategy and using deterministic formula. BayesAS models improved prediction reliability for most of the traits compared to GBLUP models and this gain depended on segment size and genetic architecture of the traits. The gain in prediction reliability was especially marked for the protein composition traits β-CN, κ-CN and β-LG, for which prediction reliabilities were improved by 49 percentage points on average using the MT-BayesAS model with a 100-SNP segment size compared to the bivariate GBLUP. Prediction reliabilities were highest with the BayesAS model that uses a 100-SNP segment size. The bivariate versions of our BayesAS models resulted in extra gains of up to 6% in prediction reliability compared to the univariate versions. Substantial improvement in prediction reliability was possible for most of the traits related to milk protein composition using our novel BayesAS models. Grouping adjacent SNPs into segments provided enhanced information to estimate parameters and allowing the segments to have different (co)variances helped disentangle heterogeneous (co)variances across the genome.
NASA Technical Reports Server (NTRS)
Fu, Lee-Lueng; Vazquez, Jorge; Perigaud, Claire
1991-01-01
Free, equatorially trapped sinusoidal wave solutions to a linear model on an equatorial beta plane are used to fit the Geosat altimetric sea level observations in the tropical Pacific Ocean. The Kalman filter technique is used to estimate the wave amplitude and phase from the data. The estimation is performed at each time step by combining the model forecast with the observation in an optimal fashion utilizing the respective error covariances. The model error covariance is determined such that the performance of the model forecast is optimized. It is found that the dominant observed features can be described qualitatively by basin-scale Kelvin waves and the first meridional-mode Rossby waves. Quantitatively, however, only 23 percent of the signal variance can be accounted for by this simple model.
GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment
Rietveld, Cornelius A.; Medland, Sarah E.; Derringer, Jaime; Yang, Jian; Esko, Tõnu; Martin, Nicolas W.; Westra, Harm-Jan; Shakhbazov, Konstantin; Abdellaoui, Abdel; Agrawal, Arpana; Albrecht, Eva; Alizadeh, Behrooz Z.; Amin, Najaf; Barnard, John; Baumeister, Sebastian E.; Benke, Kelly S.; Bielak, Lawrence F.; Boatman, Jeffrey A.; Boyle, Patricia A.; Davies, Gail; de Leeuw, Christiaan; Eklund, Niina; Evans, Daniel S.; Ferhmann, Rudolf; Fischer, Krista; Gieger, Christian; Gjessing, Håkon K.; Hägg, Sara; Harris, Jennifer R.; Hayward, Caroline; Holzapfel, Christina; Ibrahim-Verbaas, Carla A.; Ingelsson, Erik; Jacobsson, Bo; Joshi, Peter K.; Jugessur, Astanand; Kaakinen, Marika; Kanoni, Stavroula; Karjalainen, Juha; Kolcic, Ivana; Kristiansson, Kati; Kutalik, Zoltán; Lahti, Jari; Lee, Sang H.; Lin, Peng; Lind, Penelope A.; Liu, Yongmei; Lohman, Kurt; Loitfelder, Marisa; McMahon, George; Vidal, Pedro Marques; Meirelles, Osorio; Milani, Lili; Myhre, Ronny; Nuotio, Marja-Liisa; Oldmeadow, Christopher J.; Petrovic, Katja E.; Peyrot, Wouter J.; Polašek, Ozren; Quaye, Lydia; Reinmaa, Eva; Rice, John P.; Rizzi, Thais S.; Schmidt, Helena; Schmidt, Reinhold; Smith, Albert V.; Smith, Jennifer A.; Tanaka, Toshiko; Terracciano, Antonio; van der Loos, Matthijs J.H.M.; Vitart, Veronique; Völzke, Henry; Wellmann, Jürgen; Yu, Lei; Zhao, Wei; Allik, Jüri; Attia, John R.; Bandinelli, Stefania; Bastardot, François; Beauchamp, Jonathan; Bennett, David A.; Berger, Klaus; Bierut, Laura J.; Boomsma, Dorret I.; Bültmann, Ute; Campbell, Harry; Chabris, Christopher F.; Cherkas, Lynn; Chung, Mina K.; Cucca, Francesco; de Andrade, Mariza; De Jager, Philip L.; De Neve, Jan-Emmanuel; Deary, Ian J.; Dedoussis, George V.; Deloukas, Panos; Dimitriou, Maria; Eiriksdottir, Gudny; Elderson, Martin F.; Eriksson, Johan G.; Evans, David M.; Faul, Jessica D.; Ferrucci, Luigi; Garcia, Melissa E.; Grönberg, Henrik; Gudnason, Vilmundur; Hall, Per; Harris, Juliette M.; Harris, Tamara B.; Hastie, Nicholas D.; Heath, Andrew C.; Hernandez, Dena G.; Hoffmann, Wolfgang; Hofman, Adriaan; Holle, Rolf; Holliday, Elizabeth G.; Hottenga, Jouke-Jan; Iacono, William G.; Illig, Thomas; Järvelin, Marjo-Riitta; Kähönen, Mika; Kaprio, Jaakko; Kirkpatrick, Robert M.; Kowgier, Matthew; Latvala, Antti; Launer, Lenore J.; Lawlor, Debbie A.; Lehtimäki, Terho; Li, Jingmei; Lichtenstein, Paul; Lichtner, Peter; Liewald, David C.; Madden, Pamela A.; Magnusson, Patrik K. E.; Mäkinen, Tomi E.; Masala, Marco; McGue, Matt; Metspalu, Andres; Mielck, Andreas; Miller, Michael B.; Montgomery, Grant W.; Mukherjee, Sutapa; Nyholt, Dale R.; Oostra, Ben A.; Palmer, Lyle J.; Palotie, Aarno; Penninx, Brenda; Perola, Markus; Peyser, Patricia A.; Preisig, Martin; Räikkönen, Katri; Raitakari, Olli T.; Realo, Anu; Ring, Susan M.; Ripatti, Samuli; Rivadeneira, Fernando; Rudan, Igor; Rustichini, Aldo; Salomaa, Veikko; Sarin, Antti-Pekka; Schlessinger, David; Scott, Rodney J.; Snieder, Harold; Pourcain, Beate St; Starr, John M.; Sul, Jae Hoon; Surakka, Ida; Svento, Rauli; Teumer, Alexander; Tiemeier, Henning; Rooij, Frank JAan; Van Wagoner, David R.; Vartiainen, Erkki; Viikari, Jorma; Vollenweider, Peter; Vonk, Judith M.; Waeber, Gérard; Weir, David R.; Wichmann, H.-Erich; Widen, Elisabeth; Willemsen, Gonneke; Wilson, James F.; Wright, Alan F.; Conley, Dalton; Davey-Smith, George; Franke, Lude; Groenen, Patrick J. F.; Hofman, Albert; Johannesson, Magnus; Kardia, Sharon L.R.; Krueger, Robert F.; Laibson, David; Martin, Nicholas G.; Meyer, Michelle N.; Posthuma, Danielle; Thurik, A. Roy; Timpson, Nicholas J.; Uitterlinden, André G.; van Duijn, Cornelia M.; Visscher, Peter M.; Benjamin, Daniel J.; Cesarini, David; Koellinger, Philipp D.
2013-01-01
A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R2 ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics. PMID:23722424
The PX-EM algorithm for fast stable fitting of Henderson's mixed model
Foulley, Jean-Louis; Van Dyk, David A
2000-01-01
This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399
On the estimation variance for the specific Euler-Poincaré characteristic of random networks.
Tscheschel, A; Stoyan, D
2003-07-01
The specific Euler number is an important topological characteristic in many applications. It is considered here for the case of random networks, which may appear in microscopy either as primary objects of investigation or as secondary objects describing in an approximate way other structures such as, for example, porous media. For random networks there is a simple and natural estimator of the specific Euler number. For its estimation variance, a simple Poisson approximation is given. It is based on the general exact formula for the estimation variance. In two examples of quite different nature and topology application of the formulas is demonstrated.
An empirical analysis of the distribution of overshoots in a stationary Gaussian stochastic process
NASA Technical Reports Server (NTRS)
Carter, M. C.; Madison, M. W.
1973-01-01
The frequency distribution of overshoots in a stationary Gaussian stochastic process is analyzed. The primary processes involved in this analysis are computer simulation and statistical estimation. Computer simulation is used to simulate stationary Gaussian stochastic processes that have selected autocorrelation functions. An analysis of the simulation results reveals a frequency distribution for overshoots with a functional dependence on the mean and variance of the process. Statistical estimation is then used to estimate the mean and variance of a process. It is shown that for an autocorrelation function, the mean and the variance for the number of overshoots, a frequency distribution for overshoots can be estimated.
Unraveling additive from nonadditive effects using genomic relationship matrices.
Muñoz, Patricio R; Resende, Marcio F R; Gezan, Salvador A; Resende, Marcos Deon Vilela; de Los Campos, Gustavo; Kirst, Matias; Huber, Dudley; Peter, Gary F
2014-12-01
The application of quantitative genetics in plant and animal breeding has largely focused on additive models, which may also capture dominance and epistatic effects. Partitioning genetic variance into its additive and nonadditive components using pedigree-based models (P-genomic best linear unbiased predictor) (P-BLUP) is difficult with most commonly available family structures. However, the availability of dense panels of molecular markers makes possible the use of additive- and dominance-realized genomic relationships for the estimation of variance components and the prediction of genetic values (G-BLUP). We evaluated height data from a multifamily population of the tree species Pinus taeda with a systematic series of models accounting for additive, dominance, and first-order epistatic interactions (additive by additive, dominance by dominance, and additive by dominance), using either pedigree- or marker-based information. We show that, compared with the pedigree, use of realized genomic relationships in marker-based models yields a substantially more precise separation of additive and nonadditive components of genetic variance. We conclude that the marker-based relationship matrices in a model including additive and nonadditive effects performed better, improving breeding value prediction. Moreover, our results suggest that, for tree height in this population, the additive and nonadditive components of genetic variance are similar in magnitude. This novel result improves our current understanding of the genetic control and architecture of a quantitative trait and should be considered when developing breeding strategies. Copyright © 2014 by the Genetics Society of America.
Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction.
Mai, Uyen; Sayyari, Erfan; Mirarab, Siavash
2017-01-01
Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods.
Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction
Sayyari, Erfan; Mirarab, Siavash
2017-01-01
Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods. PMID:28800608
Estimation of sampling error uncertainties in observed surface air temperature change in China
NASA Astrophysics Data System (ADS)
Hua, Wei; Shen, Samuel S. P.; Weithmann, Alexander; Wang, Huijun
2017-08-01
This study examines the sampling error uncertainties in the monthly surface air temperature (SAT) change in China over recent decades, focusing on the uncertainties of gridded data, national averages, and linear trends. Results indicate that large sampling error variances appear at the station-sparse area of northern and western China with the maximum value exceeding 2.0 K2 while small sampling error variances are found at the station-dense area of southern and eastern China with most grid values being less than 0.05 K2. In general, the negative temperature existed in each month prior to the 1980s, and a warming in temperature began thereafter, which accelerated in the early and mid-1990s. The increasing trend in the SAT series was observed for each month of the year with the largest temperature increase and highest uncertainty of 0.51 ± 0.29 K (10 year)-1 occurring in February and the weakest trend and smallest uncertainty of 0.13 ± 0.07 K (10 year)-1 in August. The sampling error uncertainties in the national average annual mean SAT series are not sufficiently large to alter the conclusion of the persistent warming in China. In addition, the sampling error uncertainties in the SAT series show a clear variation compared with other uncertainty estimation methods, which is a plausible reason for the inconsistent variations between our estimate and other studies during this period.
NASA Astrophysics Data System (ADS)
Kitterød, Nils-Otto
2017-08-01
Unconsolidated sediment cover thickness (D) above bedrock was estimated by using a publicly available well database from Norway, GRANADA. General challenges associated with such databases typically involve clustering and bias. However, if information about the horizontal distance to the nearest bedrock outcrop (L) is included, does the spatial estimation of D improve? This idea was tested by comparing two cross-validation results: ordinary kriging (OK) where L was disregarded; and co-kriging (CK) where cross-covariance between D and L was included. The analysis showed only minor differences between OK and CK with respect to differences between estimation and true values. However, the CK results gave in general less estimation variance compared to the OK results. All observations were declustered and transformed to standard normal probability density functions before estimation and back-transformed for the cross-validation analysis. The semivariogram analysis gave correlation lengths for D and L of approx. 10 and 6 km. These correlations reduce the estimation variance in the cross-validation analysis because more than 50 % of the data material had two or more observations within a radius of 5 km. The small-scale variance of D, however, was about 50 % of the total variance, which gave an accuracy of less than 60 % for most of the cross-validation cases. Despite the noisy character of the observations, the analysis demonstrated that L can be used as secondary information to reduce the estimation variance of D.
Diep, Pham Bich; Tan, Frans E. S.; Knibbe, Ronald A.; De Vries, Nanne
2016-01-01
Background: This study used multi-level analysis to estimate which type of factor explains most of the variance in alcohol consumption of Vietnamese students. Methods: Data were collected among 6011 students attending 12 universities/faculties in four provinces in Vietnam. The three most recent drinking occasions were investigated per student, resulting in 12,795 drinking occasions among 4265 drinkers. Students reported on 10 aspects of the drinking context per drinking occasion. A multi-level mixed-effects linear regression model was constructed in which aspects of drinking context composed the first level; the age of students and four drinking motives comprised the second level. The dependent variable was the number of drinks. Results: Of the aspects of context, drinking duration had the strongest association with alcohol consumption while, at the individual level, coping motive had the strongest association. The drinking context characteristics explained more variance than the individual characteristics in alcohol intake per occasion. Conclusions: These findings suggest that, among students in Vietnam, the drinking context explains a larger proportion of the variance in alcohol consumption than the drinking motives. Therefore, measures that reduce the availability of alcohol in specific drinking situations are an essential part of an effective prevention policy. PMID:27420089
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
Liu, Ruijie; Holik, Aliaksei Z.; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E.; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.
2015-01-01
Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package. PMID:25925576
NASA Astrophysics Data System (ADS)
Chen, Cheng; Xu, Weijie; Guo, Tong; Chen, Kai
2017-10-01
Uncertainties in structure properties can result in different responses in hybrid simulations. Quantification of the effect of these uncertainties would enable researchers to estimate the variances of structural responses observed from experiments. This poses challenges for real-time hybrid simulation (RTHS) due to the existence of actuator delay. Polynomial chaos expansion (PCE) projects the model outputs on a basis of orthogonal stochastic polynomials to account for influences of model uncertainties. In this paper, PCE is utilized to evaluate effect of actuator delay on the maximum displacement from real-time hybrid simulation of a single degree of freedom (SDOF) structure when accounting for uncertainties in structural properties. The PCE is first applied for RTHS without delay to determine the order of PCE, the number of sample points as well as the method for coefficients calculation. The PCE is then applied to RTHS with actuator delay. The mean, variance and Sobol indices are compared and discussed to evaluate the effects of actuator delay on uncertainty quantification for RTHS. Results show that the mean and the variance of the maximum displacement increase linearly and exponentially with respect to actuator delay, respectively. Sensitivity analysis through Sobol indices also indicates the influence of the single random variable decreases while the coupling effect increases with the increase of actuator delay.
Roy, Banibrata; Ripstein, Ira; Perry, Kyle; Cohen, Barry
2016-01-01
To determine whether the pre-medical Grade Point Average (GPA), Medical College Admission Test (MCAT), Internal examinations (Block) and National Board of Medical Examiners (NBME) scores are correlated with and predict the Medical Council of Canada Qualifying Examination Part I (MCCQE-1) scores. Data from 392 admitted students in the graduating classes of 2010-2013 at University of Manitoba (UofM), College of Medicine was considered. Pearson's correlation to assess the strength of the relationship, multiple linear regression to estimate MCCQE-1 score and stepwise linear regression to investigate the amount of variance were employed. Complete data from 367 (94%) students were studied. The MCCQE-1 had a moderate-to-large positive correlation with NBME scores and Block scores but a low correlation with GPA and MCAT scores. The multiple linear regression model gives a good estimate of the MCCQE-1 (R2 =0.604). Stepwise regression analysis demonstrated that 59.2% of the variation in the MCCQE-1 was accounted for by the NBME, but only 1.9% by the Block exams, and negligible variation came from the GPA and the MCAT. Amongst all the examinations used at UofM, the NBME is most closely correlated with MCCQE-1.
Optimal distribution of integration time for intensity measurements in Stokes polarimetry.
Li, Xiaobo; Liu, Tiegen; Huang, Bingjing; Song, Zhanjie; Hu, Haofeng
2015-10-19
We consider the typical Stokes polarimetry system, which performs four intensity measurements to estimate a Stokes vector. We show that if the total integration time of intensity measurements is fixed, the variance of the Stokes vector estimator depends on the distribution of the integration time at four intensity measurements. Therefore, by optimizing the distribution of integration time, the variance of the Stokes vector estimator can be decreased. In this paper, we obtain the closed-form solution of the optimal distribution of integration time by employing Lagrange multiplier method. According to the theoretical analysis and real-world experiment, it is shown that the total variance of the Stokes vector estimator can be significantly decreased about 40% in the case discussed in this paper. The method proposed in this paper can effectively decrease the measurement variance and thus statistically improves the measurement accuracy of the polarimetric system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beer, M.
1980-12-01
The maximum likelihood method for the multivariate normal distribution is applied to the case of several individual eigenvalues. Correlated Monte Carlo estimates of the eigenvalue are assumed to follow this prescription and aspects of the assumption are examined. Monte Carlo cell calculations using the SAM-CE and VIM codes for the TRX-1 and TRX-2 benchmark reactors, and SAM-CE full core results are analyzed with this method. Variance reductions of a few percent to a factor of 2 are obtained from maximum likelihood estimation as compared with the simple average and the minimum variance individual eigenvalue. The numerical results verify that themore » use of sample variances and correlation coefficients in place of the corresponding population statistics still leads to nearly minimum variance estimation for a sufficient number of histories and aggregates.« less
Calibrating SALT: a sampling scheme to improve estimates of suspended sediment yield
Robert B. Thomas
1986-01-01
Abstract - SALT (Selection At List Time) is a variable probability sampling scheme that provides unbiased estimates of suspended sediment yield and its variance. SALT performs better than standard schemes which are estimate variance. Sampling probabilities are based on a sediment rating function which promotes greater sampling intensity during periods of high...
Adaptive Filtering Using Recurrent Neural Networks
NASA Technical Reports Server (NTRS)
Parlos, Alexander G.; Menon, Sunil K.; Atiya, Amir F.
2005-01-01
A method for adaptive (or, optionally, nonadaptive) filtering has been developed for estimating the states of complex process systems (e.g., chemical plants, factories, or manufacturing processes at some level of abstraction) from time series of measurements of system inputs and outputs. The method is based partly on the fundamental principles of the Kalman filter and partly on the use of recurrent neural networks. The standard Kalman filter involves an assumption of linearity of the mathematical model used to describe a process system. The extended Kalman filter accommodates a nonlinear process model but still requires linearization about the state estimate. Both the standard and extended Kalman filters involve the often unrealistic assumption that process and measurement noise are zero-mean, Gaussian, and white. In contrast, the present method does not involve any assumptions of linearity of process models or of the nature of process noise; on the contrary, few (if any) assumptions are made about process models, noise models, or the parameters of such models. In this regard, the method can be characterized as one of nonlinear, nonparametric filtering. The method exploits the unique ability of neural networks to approximate nonlinear functions. In a given case, the process model is limited mainly by limitations of the approximation ability of the neural networks chosen for that case. Moreover, despite the lack of assumptions regarding process noise, the method yields minimum- variance filters. In that they do not require statistical models of noise, the neural- network-based state filters of this method are comparable to conventional nonlinear least-squares estimators.
Evaluation of three lidar scanning strategies for turbulence measurements
Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia; ...
2016-05-03
Several errors occur when a traditional Doppler beam swinging (DBS) or velocity–azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused bymore » VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.« less
Evaluation of three lidar scanning strategies for turbulence measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia
Several errors occur when a traditional Doppler beam swinging (DBS) or velocity–azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused bymore » VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.« less
A two step Bayesian approach for genomic prediction of breeding values.
Shariati, Mohammad M; Sørensen, Peter; Janss, Luc
2012-05-21
In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
Monthly hydroclimatology of the continental United States
NASA Astrophysics Data System (ADS)
Petersen, Thomas; Devineni, Naresh; Sankarasubramanian, A.
2018-04-01
Physical/semi-empirical models that do not require any calibration are of paramount need for estimating hydrological fluxes for ungauged sites. We develop semi-empirical models for estimating the mean and variance of the monthly streamflow based on Taylor Series approximation of a lumped physically based water balance model. The proposed models require mean and variance of monthly precipitation and potential evapotranspiration, co-variability of precipitation and potential evapotranspiration and regionally calibrated catchment retention sensitivity, atmospheric moisture uptake sensitivity, groundwater-partitioning factor, and the maximum soil moisture holding capacity parameters. Estimates of mean and variance of monthly streamflow using the semi-empirical equations are compared with the observed estimates for 1373 catchments in the continental United States. Analyses show that the proposed models explain the spatial variability in monthly moments for basins in lower elevations. A regionalization of parameters for each water resources region show good agreement between observed moments and model estimated moments during January, February, March and April for mean and all months except May and June for variance. Thus, the proposed relationships could be employed for understanding and estimating the monthly hydroclimatology of ungauged basins using regional parameters.
Detection of gene-environment interaction in pedigree data using genome-wide genotypes.
Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V
2016-12-01
Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits.
NASA Astrophysics Data System (ADS)
Asanuma, Jun
Variances of the velocity components and scalars are important as indicators of the turbulence intensity. They also can be utilized to estimate surface fluxes in several types of "variance methods", and the estimated fluxes can be regional values if the variances from which they are calculated are regionally representative measurements. On these motivations, variances measured by an aircraft in the unstable ABL over a flat pine forest during HAPEX-Mobilhy were analyzed within the context of the similarity scaling arguments. The variances of temperature and vertical velocity within the atmospheric surface layer were found to follow closely the Monin-Obukhov similarity theory, and to yield reasonable estimates of the surface sensible heat fluxes when they are used in variance methods. This gives a validation to the variance methods with aircraft measurements. On the other hand, the specific humidity variances were influenced by the surface heterogeneity and clearly fail to obey MOS. A simple analysis based on the similarity law for free convection produced a comprehensible and quantitative picture regarding the effect of the surface flux heterogeneity on the statistical moments, and revealed that variances of the active and passive scalars become dissimilar because of their different roles in turbulence. The analysis also indicated that the mean quantities are also affected by the heterogeneity but to a less extent than the variances. The temperature variances in the mixed layer (ML) were examined by using a generalized top-down bottom-up diffusion model with some combinations of velocity scales and inversion flux models. The results showed that the surface shear stress exerts considerable influence on the lower ML. Also with the temperature and vertical velocity variances ML variance methods were tested, and their feasibility was investigated. Finally, the variances in the ML were analyzed in terms of the local similarity concept; the results confirmed the original hypothesis by Panofsky and McCormick that the local scaling in terms of the local buoyancy flux defines the lower bound of the moments.
Arambasić, M B; Jatić-Slavković, D
2004-05-01
This paper presents the application of the regression analysis program and the program for comparing linear regressions (modified method for one-way, analysis of variance), writtens in BASIC program language, for instance, determination of content of Diclofenac-Sodium (active ingredient in DIKLOFEN injections, ampules á 75 mg/3 ml). Stability testing of Diclofenac-Sodium was done by isothermic method of accelerated aging at 4 different temperatures (30 degrees, 40 degrees, 50 degrees and 60 degrees C) as a function of time (4 different duration of treatment: (0-155, 0-145, 0-74 and 0-44 days). The decrease in stability (decrease in the mean value of the content of Diclofenac-Sodium (in %), at different temperatures as a function of time, is possible to describe by, linear dependance. According to the value for regression equation values, the times are assessed in which the content of Diclofenac-Sodium (in %) will decrease by 10%, of the initial value. The times are follows at 30 degrees C 761.02 days, at 40 degrees C 397.26 days, at 50 degrees C 201.96 days and at 60 degrees C 58.85 days. The estimated times (in days) in which the mean value for Diclofenac-Sodium content (in %) will by 10% of the initial values, as a junction of time, are most suitably described by 3rd order parabola. Based on the parameter values which describe the 3rd order parabola, the time was estimated in which Diclofenac-Sodium content mean value (in %) will fall by 10% of the initial one at average ambient temperatures of 20 degrees C and 25 degrees C. The times are: 1409.47 days (20 degrees C) and 1042.39 days (25 degrees C). Based on the value for Fischer's coefficien (F), the comparison of trenf of Diclofenac-Sodium content (in %) shows that, under the influence of different temperatures as a function of time, among them, depending on temperature value, there is: statistically very significant difference (P < .05) at 50 degrees C and lower toward 60 degrees C, i.e. statistically probably significant difference (P > 0.01) at 40 degrees C and lower towards 50 degrees C and there is no statistically significance difference (P > 0.05) at 30 degrees C towards 40 degrees C.
Chen, Lin; Ray, Shonket; Keller, Brad M; Pertuz, Said; McDonald, Elizabeth S; Conant, Emily F; Kontos, Despina
2016-09-01
Purpose To investigate the impact of radiation dose on breast density estimation in digital mammography. Materials and Methods With institutional review board approval and Health Insurance Portability and Accountability Act compliance under waiver of consent, a cohort of women from the American College of Radiology Imaging Network Pennsylvania 4006 trial was retrospectively analyzed. All patients underwent breast screening with a combination of dose protocols, including standard full-field digital mammography, low-dose digital mammography, and digital breast tomosynthesis. A total of 5832 images from 486 women were analyzed with previously validated, fully automated software for quantitative estimation of density. Clinical Breast Imaging Reporting and Data System (BI-RADS) density assessment results were also available from the trial reports. The influence of image acquisition radiation dose on quantitative breast density estimation was investigated with analysis of variance and linear regression. Pairwise comparisons of density estimations at different dose levels were performed with Student t test. Agreement of estimation was evaluated with quartile-weighted Cohen kappa values and Bland-Altman limits of agreement. Results Radiation dose of image acquisition did not significantly affect quantitative density measurements (analysis of variance, P = .37 to P = .75), with percent density demonstrating a high overall correlation between protocols (r = 0.88-0.95; weighted κ = 0.83-0.90). However, differences in breast percent density (1.04% and 3.84%, P < .05) were observed within high BI-RADS density categories, although they were significantly correlated across the different acquisition dose levels (r = 0.76-0.92, P < .05). Conclusion Precision and reproducibility of automated breast density measurements with digital mammography are not substantially affected by variations in radiation dose; thus, the use of low-dose techniques for the purpose of density estimation may be feasible. (©) RSNA, 2016 Online supplemental material is available for this article.
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Chen, Lin; Ray, Shonket; Keller, Brad M.; Pertuz, Said; McDonald, Elizabeth S.; Conant, Emily F.
2016-01-01
Purpose To investigate the impact of radiation dose on breast density estimation in digital mammography. Materials and Methods With institutional review board approval and Health Insurance Portability and Accountability Act compliance under waiver of consent, a cohort of women from the American College of Radiology Imaging Network Pennsylvania 4006 trial was retrospectively analyzed. All patients underwent breast screening with a combination of dose protocols, including standard full-field digital mammography, low-dose digital mammography, and digital breast tomosynthesis. A total of 5832 images from 486 women were analyzed with previously validated, fully automated software for quantitative estimation of density. Clinical Breast Imaging Reporting and Data System (BI-RADS) density assessment results were also available from the trial reports. The influence of image acquisition radiation dose on quantitative breast density estimation was investigated with analysis of variance and linear regression. Pairwise comparisons of density estimations at different dose levels were performed with Student t test. Agreement of estimation was evaluated with quartile-weighted Cohen kappa values and Bland-Altman limits of agreement. Results Radiation dose of image acquisition did not significantly affect quantitative density measurements (analysis of variance, P = .37 to P = .75), with percent density demonstrating a high overall correlation between protocols (r = 0.88–0.95; weighted κ = 0.83–0.90). However, differences in breast percent density (1.04% and 3.84%, P < .05) were observed within high BI-RADS density categories, although they were significantly correlated across the different acquisition dose levels (r = 0.76–0.92, P < .05). Conclusion Precision and reproducibility of automated breast density measurements with digital mammography are not substantially affected by variations in radiation dose; thus, the use of low-dose techniques for the purpose of density estimation may be feasible. © RSNA, 2016 Online supplemental material is available for this article. PMID:27002418
The applicability of dental wear in age estimation for a modern American population.
Faillace, Katie E; Bethard, Jonathan D; Marks, Murray K
2017-12-01
Though applied in bioarchaeology, dental wear is an underexplored age indicator in the biological anthropology of contemporary populations, although research has been conducted on dental attrition in forensic contexts (Kim et al., , Journal of Forensic Sciences, 45, 303; Prince et al., , Journal of Forensic Sciences, 53, 588; Yun et al., , Journal of Forensic Sciences, 52, 678). The purpose of this study is to apply and adapt existing techniques for age estimation based on dental wear to a modern American population, with the aim of producing accurate age range estimates for individuals from an industrialized context. Methodologies following Yun and Prince were applied to a random sample from the University of New Mexico (n = 583) and Universidade de Coimbra (n = 50) cast and skeletal collections. Analysis of variance (ANOVA) and linear regression analyses were conducted to examine the relationship between tooth wear scores and age. Application of both Yun et al. () and Prince et al. () methodologies resulted in inaccurate age estimates. Recalibrated sectioning points correctly classified individuals as over or under 50 years for 88% of the sample. Linear regression demonstrated 60% of age estimates fell within ±10 years of the actual age, and accuracy improved for individuals under 45 years, with 74% of predictions within ±10 years. This study demonstrates age estimation from dental wear is possible for modern populations, with comparable age intervals to other established methods. It provides a quantifiable method of seriation into "older" and "younger" adult categories, and provides more reliable age interval estimates than cranial sutures in instances where only the skull is available. © 2017 Wiley Periodicals, Inc.
Pisharady, Pramod Kumar; Sotiropoulos, Stamatios N; Duarte-Carvajalino, Julio M; Sapiro, Guillermo; Lenglet, Christophe
2018-02-15
We present a sparse Bayesian unmixing algorithm BusineX: Bayesian Unmixing for Sparse Inference-based Estimation of Fiber Crossings (X), for estimation of white matter fiber parameters from compressed (under-sampled) diffusion MRI (dMRI) data. BusineX combines compressive sensing with linear unmixing and introduces sparsity to the previously proposed multiresolution data fusion algorithm RubiX, resulting in a method for improved reconstruction, especially from data with lower number of diffusion gradients. We formulate the estimation of fiber parameters as a sparse signal recovery problem and propose a linear unmixing framework with sparse Bayesian learning for the recovery of sparse signals, the fiber orientations and volume fractions. The data is modeled using a parametric spherical deconvolution approach and represented using a dictionary created with the exponential decay components along different possible diffusion directions. Volume fractions of fibers along these directions define the dictionary weights. The proposed sparse inference, which is based on the dictionary representation, considers the sparsity of fiber populations and exploits the spatial redundancy in data representation, thereby facilitating inference from under-sampled q-space. The algorithm improves parameter estimation from dMRI through data-dependent local learning of hyperparameters, at each voxel and for each possible fiber orientation, that moderate the strength of priors governing the parameter variances. Experimental results on synthetic and in-vivo data show improved accuracy with a lower uncertainty in fiber parameter estimates. BusineX resolves a higher number of second and third fiber crossings. For under-sampled data, the algorithm is also shown to produce more reliable estimates. Copyright © 2017 Elsevier Inc. All rights reserved.
’Exact’ Two-Sided Confidence Intervals on Nonnegative Linear Combinations of Variances.
1980-07-01
Colorado State University ( 042_402) II. CONTrOLLING OFFICE NAME AND ADDRESS It. REPORT OAT Office of Naval Rsearch -// 1 Jul MjW80 Statistics and...MONNEGATIVE LINEAR COMBINATIONS OF VARIANCES by Franklin A. Graybill Colorado State University and Chih-Ming Wang SPSS Inc. 1. Introduction In a paper to soon...1 + a2’ called the Nodf Led Lace Sample (HLS) confidence interval, is in 2. Aoce-3Ion For DDC TAO u*.- *- -. n c edI Ju.-’I if iction_, i !~BV . . I
Tang, Yongqiang
2017-12-01
Control-based pattern mixture models (PMM) and delta-adjusted PMMs are commonly used as sensitivity analyses in clinical trials with non-ignorable dropout. These PMMs assume that the statistical behavior of outcomes varies by pattern in the experimental arm in the imputation procedure, but the imputed data are typically analyzed by a standard method such as the primary analysis model. In the multiple imputation (MI) inference, Rubin's variance estimator is generally biased when the imputation and analysis models are uncongenial. One objective of the article is to quantify the bias of Rubin's variance estimator in the control-based and delta-adjusted PMMs for longitudinal continuous outcomes. These PMMs assume the same observed data distribution as the mixed effects model for repeated measures (MMRM). We derive analytic expressions for the MI treatment effect estimator and the associated Rubin's variance in these PMMs and MMRM as functions of the maximum likelihood estimator from the MMRM analysis and the observed proportion of subjects in each dropout pattern when the number of imputations is infinite. The asymptotic bias is generally small or negligible in the delta-adjusted PMM, but can be sizable in the control-based PMM. This indicates that the inference based on Rubin's rule is approximately valid in the delta-adjusted PMM. A simple variance estimator is proposed to ensure asymptotically valid MI inferences in these PMMs, and compared with the bootstrap variance. The proposed method is illustrated by the analysis of an antidepressant trial, and its performance is further evaluated via a simulation study. © 2017, The International Biometric Society.
Rössler, Wulf; Hengartner, Michael P; Ajdacic-Gross, Vladeta; Haker, Helene; Angst, Jules
2013-10-01
Our aim was to deconstruct the variance underlying the expression of sub-clinical psychosis symptoms into portions associated with latent time-dependent states and time-invariant traits. We analyzed data of 335 subjects from the general population of Zurich, Switzerland, who had been repeatedly measured between 1979 (age 20/21) and 2008 (age 49/50). We applied two measures of sub-clinical psychosis derived from the SCL-90-R, namely schizotypal signs (STS) and schizophrenia nuclear symptoms (SNS). Variance was decomposed with latent state-trait analysis and associations with covariates were examined with generalized linear models. At ages 19/20 and 49/50, the latent states underlying STS accounted for 48% and 51% of variance, whereas for SNS those estimates were 62% and 50%. Between those age classes, however, expression of sub-clinical psychosis was strongly associated with stable traits (75% and 89% of total variance in STS and SNS, respectively, at age 27/28). Latent states underlying variance in STS and SNS were particularly related to partnership problems over almost the entire observation period. STS was additionally related to employment problems, whereas drug-use was a strong predictor of states underlying both syndromes at age 19/20. The latent trait underlying expression of STS and SNS was particularly related to low sense of mastery and self-esteem and to high depressiveness. Although most psychosis symptoms are transient and episodic in nature, the variability in their expression is predominantly caused by stable traits. Those time-invariant and rather consistent effects are particularly influential around age 30, whereas the occasion-specific states appear to be particularly influential at ages 20 and 50. © 2013.
Nonlinear problems in data-assimilation : Can synchronization help?
NASA Astrophysics Data System (ADS)
Tribbia, J. J.; Duane, G. S.
2009-12-01
Over the past several years, operational weather centers have initiated ensemble prediction and assimilation techniques to estimate the error covariance of forecasts in the short and the medium range. The ensemble techniques used are based on linear methods. The theory This technique s been shown to be a useful indicator of skill in the linear range where forecast errors are small relative to climatological variance. While this advance has been impressive, there are still ad hoc aspects of its use in practice, like the need for covariance inflation which are troubling. Furthermore, to be of utility in the nonlinear range an ensemble assimilation and prediction method must be capable of giving probabilistic information for the situation where a probability density forecast becomes multi-modal. A prototypical, simplest example of such a situation is the planetary-wave regime transition where the pdf is bimodal. Our recent research show how the inconsistencies and extensions of linear methodology can be consistently treated using the paradigm of synchronization which views the problems of assimilation and forecasting as that of optimizing the forecast model state with respect to the future evolution of the atmosphere.
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.
Ozay, Guner; Seyhan, Ferda; Yilmaz, Aysun; Whitaker, Thomas B; Slate, Andrew B; Giesbrecht, Francis
2006-01-01
The variability associated with the aflatoxin test procedure used to estimate aflatoxin levels in bulk shipments of hazelnuts was investigated. Sixteen 10 kg samples of shelled hazelnuts were taken from each of 20 lots that were suspected of aflatoxin contamination. The total variance associated with testing shelled hazelnuts was estimated and partitioned into sampling, sample preparation, and analytical variance components. Each variance component increased as aflatoxin concentration (either B1 or total) increased. With the use of regression analysis, mathematical expressions were developed to model the relationship between aflatoxin concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific aflatoxin concentration. The sampling, sample preparation, and analytical variances associated with estimating aflatoxin in a hazelnut lot at a total aflatoxin level of 10 ng/g and using a 10 kg sample, a 50 g subsample, dry comminution with a Robot Coupe mill, and a high-performance liquid chromatographic analytical method are 174.40, 0.74, and 0.27, respectively. The sampling, sample preparation, and analytical steps of the aflatoxin test procedure accounted for 99.4, 0.4, and 0.2% of the total variability, respectively.
Genomic analysis of cow mortality and milk production using a threshold-linear model.
Tsuruta, S; Lourenco, D A L; Misztal, I; Lawlor, T J
2017-09-01
The objective of this study was to investigate the feasibility of genomic evaluation for cow mortality and milk production using a single-step methodology. Genomic relationships between cow mortality and milk production were also analyzed. Data included 883,887 (866,700) first-parity, 733,904 (711,211) second-parity, and 516,256 (492,026) third-parity records on cow mortality (305-d milk yields) of Holsteins from Northeast states in the United States. The pedigree consisted of up to 1,690,481 animals including 34,481 bulls genotyped with 36,951 SNP markers. Analyses were conducted with a bivariate threshold-linear model for each parity separately. Genomic information was incorporated as a genomic relationship matrix in the single-step BLUP. Traditional and genomic estimated breeding values (GEBV) were obtained with Gibbs sampling using fixed variances, whereas reliabilities were calculated from variances of GEBV samples. Genomic EBV were then converted into single nucleotide polymorphism (SNP) marker effects. Those SNP effects were categorized according to values corresponding to 1 to 4 standard deviations. Moving averages and variances of SNP effects were calculated for windows of 30 adjacent SNP, and Manhattan plots were created for SNP variances with the same window size. Using Gibbs sampling, the reliability for genotyped bulls for cow mortality was 28 to 30% in EBV and 70 to 72% in GEBV. The reliability for genotyped bulls for 305-d milk yields was 53 to 65% to 81 to 85% in GEBV. Correlations of SNP effects between mortality and 305-d milk yields within categories were the highest with the largest SNP effects and reached >0.7 at 4 standard deviations. All SNP regions explained less than 0.6% of the genetic variance for both traits, except regions close to the DGAT1 gene, which explained up to 2.5% for cow mortality and 4% for 305-d milk yields. Reliability for GEBV with a moderate number of genotyped animals can be calculated by Gibbs samples. Genomic information can greatly increase the reliability of predictions not only for milk but also for mortality. The existence of a common region on Bos taurus autosome 14 affecting both traits may indicate a major gene with a pleiotropic effect on milk and mortality. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Polarized-interferometer feasibility study
NASA Technical Reports Server (NTRS)
Raab, F. H.
1983-01-01
The feasibility of using a polarized-interferometer system as a rendezvous and docking sensor for two cooperating spacecraft was studied. The polarized interferometer is a radio frequency system for long range, real time determination of relative position and attitude. Range is determined by round trip signal timing. Direction is determined by radio interferometry. Relative roll is determined from signal polarization. Each spacecraft is equipped with a transponder and an antenna array. The antenna arrays consist of four crossed dipoles that can transmit or receive either circularly or linearly polarized signals. The active spacecraft is equipped with a sophisticated transponder and makes all measurements. The transponder on the passive spacecraft is a relatively simple repeater. An initialization algorithm is developed to estimate position and attitude without any a priori information. A tracking algorithm based upon minimum variance linear estimators is also developed. Techniques to simplify the transponder on the passive spacecraft are investigated and a suitable configuration is determined. A multiple carrier CW signal format is selected. The dependence of range accuracy and ambiguity resolution error probability are derived and used to design a candidate system. The validity of the design and the feasibility of the polarized interferometer concept are verified by simulation.
NASA Astrophysics Data System (ADS)
Schaperow, J.; Cooper, M. G.; Cooley, S. W.; Alam, S.; Smith, L. C.; Lettenmaier, D. P.
2017-12-01
As climate regimes shift, streamflows and our ability to predict them will change, as well. Elasticity of summer minimum streamflow is estimated for 138 unimpaired headwater river basins across the maritime western US mountains to better understand how climatologic variables and geologic characteristics interact to determine the response of summer low flows to winter precipitation (PPT), spring snow water equivalent (SWE), and summertime potential evapotranspiration (PET). Elasticities are calculated using log log linear regression, and linear reservoir storage coefficients are used to represent basin geology. Storage coefficients are estimated using baseflow recession analysis. On average, SWE, PET, and PPT explain about 1/3 of the summertime low flow variance. Snow-dominated basins with long timescales of baseflow recession are least sensitive to changes in SWE, PPT, and PET, while rainfall-dominated, faster draining basins are most sensitive. There are also implications for the predictability of summer low flows. The R2 between streamflow and SWE drops from 0.62 to 0.47 from snow-dominated to rain-dominated basins, while there is no corresponding increase in R2 between streamflow and PPT.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Donoghue, K A; Rekaya, R; Bertrand, J K; Misztal, I
2004-04-01
Mating and calving records for 47,533 first-calf heifers in Australian Angus herds were used to examine the relationship between days to calving (DC) and two measures of fertility in AI data: 1) calving to first insemination (CFI) and 2) calving success (CS). Calving to first insemination and calving success were defined as binary traits. A threshold-linear Bayesian model was employed for both analyses: 1) DC and CFI and 2) DC and CS. Posterior means (SD) of additive covariance and corresponding genetic correlation between the DC and CFI were -0.62 d (0.19 d) and -0.66 (0.12), respectively. The corresponding point estimates between the DC and CS were -0.70 d (0.14 d) and -0.73 (0.06), respectively. These genetic correlations indicate a strong, negative relationship between DC and both measures of fertility in AI data. Selecting for animals with shorter DC intervals genetically will lead to correlated increases in both CS and CFI. Posterior means (SD) for additive and residual variance and heritability for DC for the DC-CFI analysis were 23.5 d2 (4.1 d2), 363.2 d2 (4.8 d2), and 0.06 (0.01), respectively. The corresponding parameter estimates for the DC-CS analysis were very similar. Posterior means (SD) for additive, herd-year and service sire variance and heritability for CFI were 0.04 (0.01), 0.06 (0.06), 0.14 (0.16), and 0.03 (0.01), respectively. Posterior means (SD) for additive, herd-year, and service sire variance and heritability for CS were 0.04 (0.01), 0.07 (0.07), 0.14 (0.16), and 0.03 (0.01), respectively. The similarity of the parameter estimates for CFI and CS suggest that either trait could be used as a measure of fertility in AI data. However, the definition of CFI allows the identification of animals that not only record a calving event, but calve to their first insemination, and the value of this trait would be even greater in a more complete dataset than that used in this study. The magnitude of the correlations between DC and CS-CFI suggest that it may be possible to use a multitrait approach in the evaluation of AI and natural service data, and to report one genetic value that could be used for selection purposes.
Uncertainty importance analysis using parametric moment ratio functions.
Wei, Pengfei; Lu, Zhenzhou; Song, Jingwen
2014-02-01
This article presents a new importance analysis framework, called parametric moment ratio function, for measuring the reduction of model output uncertainty when the distribution parameters of inputs are changed, and the emphasis is put on the mean and variance ratio functions with respect to the variances of model inputs. The proposed concepts efficiently guide the analyst to achieve a targeted reduction on the model output mean and variance by operating on the variances of model inputs. The unbiased and progressive unbiased Monte Carlo estimators are also derived for the parametric mean and variance ratio functions, respectively. Only a set of samples is needed for implementing the proposed importance analysis by the proposed estimators, thus the computational cost is free of input dimensionality. An analytical test example with highly nonlinear behavior is introduced for illustrating the engineering significance of the proposed importance analysis technique and verifying the efficiency and convergence of the derived Monte Carlo estimators. Finally, the moment ratio function is applied to a planar 10-bar structure for achieving a targeted 50% reduction of the model output variance. © 2013 Society for Risk Analysis.
Jongerling, Joran; Laurenceau, Jean-Philippe; Hamaker, Ellen L
2015-01-01
In this article we consider a multilevel first-order autoregressive [AR(1)] model with random intercepts, random autoregression, and random innovation variance (i.e., the level 1 residual variance). Including random innovation variance is an important extension of the multilevel AR(1) model for two reasons. First, between-person differences in innovation variance are important from a substantive point of view, in that they capture differences in sensitivity and/or exposure to unmeasured internal and external factors that influence the process. Second, using simulation methods we show that modeling the innovation variance as fixed across individuals, when it should be modeled as a random effect, leads to biased parameter estimates. Additionally, we use simulation methods to compare maximum likelihood estimation to Bayesian estimation of the multilevel AR(1) model and investigate the trade-off between the number of individuals and the number of time points. We provide an empirical illustration by applying the extended multilevel AR(1) model to daily positive affect ratings from 89 married women over the course of 42 consecutive days.
Fisher information and asymptotic normality in system identification for quantum Markov chains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guta, Madalin
2011-06-15
This paper deals with the problem of estimating the coupling constant {theta} of a mixing quantum Markov chain. For a repeated measurement on the chain's output we show that the outcomes' time average has an asymptotically normal (Gaussian) distribution, and we give the explicit expressions of its mean and variance. In particular, we obtain a simple estimator of {theta} whose classical Fisher information can be optimized over different choices of measured observables. We then show that the quantum state of the output together with the system is itself asymptotically Gaussian and compute its quantum Fisher information, which sets an absolutemore » bound to the estimation error. The classical and quantum Fisher information are compared in a simple example. In the vicinity of {theta}=0 we find that the quantum Fisher information has a quadratic rather than linear scaling in output size, and asymptotically the Fisher information is localized in the system, while the output is independent of the parameter.« less
Non-Gaussian probabilistic MEG source localisation based on kernel density estimation☆
Mohseni, Hamid R.; Kringelbach, Morten L.; Woolrich, Mark W.; Baker, Adam; Aziz, Tipu Z.; Probert-Smith, Penny
2014-01-01
There is strong evidence to suggest that data recorded from magnetoencephalography (MEG) follows a non-Gaussian distribution. However, existing standard methods for source localisation model the data using only second order statistics, and therefore use the inherent assumption of a Gaussian distribution. In this paper, we present a new general method for non-Gaussian source estimation of stationary signals for localising brain activity from MEG data. By providing a Bayesian formulation for MEG source localisation, we show that the source probability density function (pdf), which is not necessarily Gaussian, can be estimated using multivariate kernel density estimators. In the case of Gaussian data, the solution of the method is equivalent to that of widely used linearly constrained minimum variance (LCMV) beamformer. The method is also extended to handle data with highly correlated sources using the marginal distribution of the estimated joint distribution, which, in the case of Gaussian measurements, corresponds to the null-beamformer. The proposed non-Gaussian source localisation approach is shown to give better spatial estimates than the LCMV beamformer, both in simulations incorporating non-Gaussian signals, and in real MEG measurements of auditory and visual evoked responses, where the highly correlated sources are known to be difficult to estimate. PMID:24055702
Control algorithms for dynamic attenuators
Hsieh, Scott S.; Pelc, Norbert J.
2014-01-01
Purpose: The authors describe algorithms to control dynamic attenuators in CT and compare their performance using simulated scans. Dynamic attenuators are prepatient beam shaping filters that modulate the distribution of x-ray fluence incident on the patient on a view-by-view basis. These attenuators can reduce dose while improving key image quality metrics such as peak or mean variance. In each view, the attenuator presents several degrees of freedom which may be individually adjusted. The total number of degrees of freedom across all views is very large, making many optimization techniques impractical. The authors develop a theory for optimally controlling these attenuators. Special attention is paid to a theoretically perfect attenuator which controls the fluence for each ray individually, but the authors also investigate and compare three other, practical attenuator designs which have been previously proposed: the piecewise-linear attenuator, the translating attenuator, and the double wedge attenuator. Methods: The authors pose and solve the optimization problems of minimizing the mean and peak variance subject to a fixed dose limit. For a perfect attenuator and mean variance minimization, this problem can be solved in simple, closed form. For other attenuator designs, the problem can be decomposed into separate problems for each view to greatly reduce the computational complexity. Peak variance minimization can be approximately solved using iterated, weighted mean variance (WMV) minimization. Also, the authors develop heuristics for the perfect and piecewise-linear attenuators which do not require a priori knowledge of the patient anatomy. The authors compare these control algorithms on different types of dynamic attenuators using simulated raw data from forward projected DICOM files of a thorax and an abdomen. Results: The translating and double wedge attenuators reduce dose by an average of 30% relative to current techniques (bowtie filter with tube current modulation) without increasing peak variance. The 15-element piecewise-linear dynamic attenuator reduces dose by an average of 42%, and the perfect attenuator reduces dose by an average of 50%. Improvements in peak variance are several times larger than improvements in mean variance. Heuristic control eliminates the need for a prescan. For the piecewise-linear attenuator, the cost of heuristic control is an increase in dose of 9%. The proposed iterated WMV minimization produces results that are within a few percent of the true solution. Conclusions: Dynamic attenuators show potential for significant dose reduction. A wide class of dynamic attenuators can be accurately controlled using the described methods. PMID:24877818
Asymptotic Effect of Misspecification in the Random Part of the Multilevel Model
ERIC Educational Resources Information Center
Berkhof, Johannes; Kampen, Jarl Kennard
2004-01-01
The authors examine the asymptotic effect of omitting a random coefficient in the multilevel model and derive expressions for the change in (a) the variance components estimator and (b) the estimated variance of the fixed effects estimator. They apply the method of moments, which yields a closed form expression for the omission effect. In…
Sampling in freshwater environments: suspended particle traps and variability in the final data.
Barbizzi, Sabrina; Pati, Alessandra
2008-11-01
This paper reports one practical method to estimate the measurement uncertainty including sampling, derived by the approach implemented by Ramsey for soil investigations. The methodology has been applied to estimate the measurements uncertainty (sampling and analyses) of (137)Cs activity concentration (Bq kg(-1)) and total carbon content (%) in suspended particle sampling in a freshwater ecosystem. Uncertainty estimates for between locations, sampling and analysis components have been evaluated. For the considered measurands, the relative expanded measurement uncertainties are 12.3% for (137)Cs and 4.5% for total carbon. For (137)Cs, the measurement (sampling+analysis) variance gives the major contribution to the total variance, while for total carbon the spatial variance is the dominant contributor to the total variance. The limitations and advantages of this basic method are discussed.
Systems Engineering Programmatic Estimation Using Technology Variance
NASA Technical Reports Server (NTRS)
Mog, Robert A.
2000-01-01
Unique and innovative system programmatic estimation is conducted using the variance of the packaged technologies. Covariance analysis is performed on the subsystems and components comprising the system of interest. Technological "return" and "variation" parameters are estimated. These parameters are combined with the model error to arrive at a measure of system development stability. The resulting estimates provide valuable information concerning the potential cost growth of the system under development.
Heat and solute tracers: how do they compare in heterogeneous aquifers?
Irvine, Dylan J; Simmons, Craig T; Werner, Adrian D; Graf, Thomas
2015-04-01
A comparison of groundwater velocity in heterogeneous aquifers estimated from hydraulic methods, heat and solute tracers was made using numerical simulations. Aquifer heterogeneity was described by geostatistical properties of the Borden, Cape Cod, North Bay, and MADE aquifers. Both heat and solute tracers displayed little systematic under- or over-estimation in velocity relative to a hydraulic control. The worst cases were under-estimates of 6.63% for solute and 2.13% for the heat tracer. Both under- and over-estimation of velocity from the heat tracer relative to the solute tracer occurred. Differences between the estimates from the tracer methods increased as the mean velocity decreased, owing to differences in rates of molecular diffusion and thermal conduction. The variance in estimated velocity using all methods increased as the variance in log-hydraulic conductivity (K) and correlation length scales increased. The variance in velocity for each scenario was remarkably small when compared to σ2 ln(K) for all methods tested. The largest variability identified was for the solute tracer where 95% of velocity estimates ranged by a factor of 19 in simulations where 95% of the K values varied by almost four orders of magnitude. For the same K-fields, this range was a factor of 11 for the heat tracer. The variance in estimated velocity was always lowest when using heat as a tracer. The study results suggest that a solute tracer will provide more understanding about the variance in velocity caused by aquifer heterogeneity and a heat tracer provides a better approximation of the mean velocity. © 2013, National Ground Water Association.
Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D
2016-06-15
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A sibling method for identifying vQTLs
Domingue, Ben; Dawes, Christopher; Boardman, Jason; Siegal, Mark
2018-01-01
The propensity of a trait to vary within a population may have evolutionary, ecological, or clinical significance. In the present study we deploy sibling models to offer a novel and unbiased way to ascertain loci associated with the extent to which phenotypes vary (variance-controlling quantitative trait loci, or vQTLs). Previous methods for vQTL-mapping either exclude genetically related individuals or treat genetic relatedness among individuals as a complicating factor addressed by adjusting estimates for non-independence in phenotypes. The present method uses genetic relatedness as a tool to obtain unbiased estimates of variance effects rather than as a nuisance. The family-based approach, which utilizes random variation between siblings in minor allele counts at a locus, also allows controls for parental genotype, mean effects, and non-linear (dominance) effects that may spuriously appear to generate variation. Simulations show that the approach performs equally well as two existing methods (squared Z-score and DGLM) in controlling type I error rates when there is no unobserved confounding, and performs significantly better than these methods in the presence of small degrees of confounding. Using height and BMI as empirical applications, we investigate SNPs that alter within-family variation in height and BMI, as well as pathways that appear to be enriched. One significant SNP for BMI variability, in the MAST4 gene, replicated. Pathway analysis revealed one gene set, encoding members of several signaling pathways related to gap junction function, which appears significantly enriched for associations with within-family height variation in both datasets (while not enriched in analysis of mean levels). We recommend approximating laboratory random assignment of genotype using family data and more careful attention to the possible conflation of mean and variance effects. PMID:29617452
NASA Astrophysics Data System (ADS)
Ťupek, Boris; Launiainen, Samuli; Peltoniemi, Mikko; Heikkinen, Jukka; Lehtonen, Aleksi
2016-04-01
Litter decomposition rates of the most process based soil carbon models affected by environmental conditions are linked with soil heterotrophic CO2 emissions and serve for estimating soil carbon sequestration; thus due to the mass balance equation the variation in measured litter inputs and measured heterotrophic soil CO2 effluxes should indicate soil carbon stock changes, needed by soil carbon management for mitigation of anthropogenic CO2 emissions, if sensitivity functions of the applied model suit to the environmental conditions e.g. soil temperature and moisture. We evaluated the response forms of autotrophic and heterotrophic forest floor respiration to soil temperature and moisture in four boreal forest sites of the International Cooperative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests) by a soil trenching experiment during year 2015 in southern Finland. As expected both autotrophic and heterotrophic forest floor respiration components were primarily controlled by soil temperature and exponential regression models generally explained more than 90% of the variance. Soil moisture regression models on average explained less than 10% of the variance and the response forms varied between Gaussian for the autotrophic forest floor respiration component and linear for the heterotrophic forest floor respiration component. Although the percentage of explained variance of soil heterotrophic respiration by the soil moisture was small, the observed reduction of CO2 emissions with higher moisture levels suggested that soil moisture response of soil carbon models not accounting for the reduction due to excessive moisture should be re-evaluated in order to estimate right levels of soil carbon stock changes. Our further study will include evaluation of process based soil carbon models by the annual heterotrophic respiration and soil carbon stocks.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Funk, Christopher C.; Michaelsen, Joel C.
2004-01-01
An extension of Sinclair's diagnostic model of orographic precipitation (“VDEL”) is developed for use in data-poor regions to enhance rainfall estimates. This extension (VDELB) combines a 2D linearized internal gravity wave calculation with the dot product of the terrain gradient and surface wind to approximate terrain-induced vertical velocity profiles. Slope, wind speed, and stability determine the velocity profile, with either sinusoidal or vertically decaying (evanescent) solutions possible. These velocity profiles replace the parameterized functions in the original VDEL, creating VDELB, a diagnostic accounting for buoyancy effects. A further extension (VDELB*) uses an on/off constraint derived from reanalysis precipitation fields. A validation study over 365 days in the Pacific Northwest suggests that VDELB* can best capture seasonal and geographic variations. A new statistical data-fusion technique is presented and is used to combine VDELB*, reanalysis, and satellite rainfall estimates in southern Africa. The technique, matched filter regression (MFR), sets the variance of the predictors equal to their squared correlation with observed gauge data and predicts rainfall based on the first principal component of the combined data. In the test presented here, mean absolute errors from the MFR technique were 35% lower than the satellite estimates alone. VDELB assumes a linear solution to the wave equations and a Boussinesq atmosphere, and it may give unrealistic responses under extreme conditions. Nonetheless, the results presented here suggest that diagnostic models, driven by reanalysis data, can be used to improve satellite rainfall estimates in data-sparse regions.
Baldwin, Alex S.; Baker, Daniel H.; Hess, Robert F.
2016-01-01
The internal noise present in a linear system can be quantified by the equivalent noise method. By measuring the effect that applying external noise to the system’s input has on its output one can estimate the variance of this internal noise. By applying this simple “linear amplifier” model to the human visual system, one can entirely explain an observer’s detection performance by a combination of the internal noise variance and their efficiency relative to an ideal observer. Studies using this method rely on two crucial factors: firstly that the external noise in their stimuli behaves like the visual system’s internal noise in the dimension of interest, and secondly that the assumptions underlying their model are correct (e.g. linearity). Here we explore the effects of these two factors while applying the equivalent noise method to investigate the contrast sensitivity function (CSF). We compare the results at 0.5 and 6 c/deg from the equivalent noise method against those we would expect based on pedestal masking data collected from the same observers. We find that the loss of sensitivity with increasing spatial frequency results from changes in the saturation constant of the gain control nonlinearity, and that this only masquerades as a change in internal noise under the equivalent noise method. Part of the effect we find can be attributed to the optical transfer function of the eye. The remainder can be explained by either changes in effective input gain, divisive suppression, or a combination of the two. Given these effects the efficiency of our observers approaches the ideal level. We show the importance of considering these factors in equivalent noise studies. PMID:26953796
The Linear Predictability of Sea Level: A Benchmark
NASA Astrophysics Data System (ADS)
Sonnewald, M.; Wunsch, C.; Heimbach, P.
2016-12-01
A benchmark of linear predictive skill of global sea level is presented, complimenting more complicated model studies of future predictive skill. Sea level is of great socioeconomic interest, as most of the worlds population live by the sea. Currently, the spread in model projections suggests poor predictive skill outside the seasonal cycle. We use 20 years of data from the ECCOv4 state estimate (1992-2012), assessing the variance attributable to the seasons and the linear predictability potential of the deseasoned component of sea level. The Northern Hemisphere has large regions where the seasons make up >90% of the variance, particularly in the western boundary current regions and zonal bands along the equator. The deaseasoned sea level is more dominant in the Southern Hemisphere, particularly in the Southern Ocean. We treat the deseasoned sea level as a weakly stationary random process, whose predictability is given by the covariance structure. Fitting an ARMA(n,m) model, we choose the order using the Akaike and Bayesian Information Criteria (AIC and BIC). The AIC is more appropriate, with generally higher orders chosen and offering slightly more predictive accuracy. Monthly detrended data shows skill generally of the order of a few months, with isolated regions of twelve months or more. With the trend, the predictive skill increases, particularly in the South Pacific. We assess the annually averaged data, although our time-series is too short to assess the variability. There is some predictive skill, which is enhanced if the trend is not removed. A major caveat of our approach is that we test and train our model on the same dataset due to the short duration of available data.
Baldwin, Alex S; Baker, Daniel H; Hess, Robert F
2016-01-01
The internal noise present in a linear system can be quantified by the equivalent noise method. By measuring the effect that applying external noise to the system's input has on its output one can estimate the variance of this internal noise. By applying this simple "linear amplifier" model to the human visual system, one can entirely explain an observer's detection performance by a combination of the internal noise variance and their efficiency relative to an ideal observer. Studies using this method rely on two crucial factors: firstly that the external noise in their stimuli behaves like the visual system's internal noise in the dimension of interest, and secondly that the assumptions underlying their model are correct (e.g. linearity). Here we explore the effects of these two factors while applying the equivalent noise method to investigate the contrast sensitivity function (CSF). We compare the results at 0.5 and 6 c/deg from the equivalent noise method against those we would expect based on pedestal masking data collected from the same observers. We find that the loss of sensitivity with increasing spatial frequency results from changes in the saturation constant of the gain control nonlinearity, and that this only masquerades as a change in internal noise under the equivalent noise method. Part of the effect we find can be attributed to the optical transfer function of the eye. The remainder can be explained by either changes in effective input gain, divisive suppression, or a combination of the two. Given these effects the efficiency of our observers approaches the ideal level. We show the importance of considering these factors in equivalent noise studies.
Propagation of uncertainty by Monte Carlo simulations in case of basic geodetic computations
NASA Astrophysics Data System (ADS)
Wyszkowska, Patrycja
2017-12-01
The determination of the accuracy of functions of measured or adjusted values may be a problem in geodetic computations. The general law of covariance propagation or in case of the uncorrelated observations the propagation of variance (or the Gaussian formula) are commonly used for that purpose. That approach is theoretically justified for the linear functions. In case of the non-linear functions, the first-order Taylor series expansion is usually used but that solution is affected by the expansion error. The aim of the study is to determine the applicability of the general variance propagation law in case of the non-linear functions used in basic geodetic computations. The paper presents errors which are a result of negligence of the higher-order expressions and it determines the range of such simplification. The basis of that analysis is the comparison of the results obtained by the law of propagation of variance and the probabilistic approach, namely Monte Carlo simulations. Both methods are used to determine the accuracy of the following geodetic computations: the Cartesian coordinates of unknown point in the three-point resection problem, azimuths and distances of the Cartesian coordinates, height differences in the trigonometric and the geometric levelling. These simulations and the analysis of the results confirm the possibility of applying the general law of variance propagation in basic geodetic computations even if the functions are non-linear. The only condition is the accuracy of observations, which cannot be too low. Generally, this is not a problem with using present geodetic instruments.
Concerns about a variance approach to X-ray diffractometric estimation of microfibril angle in wood
Steve P. Verrill; David E. Kretschmann; Victoria L. Herian; Michael C. Wiemann; Harry A. Alden
2011-01-01
In this article, we raise three technical concerns about Evansâ 1999 Appita Journal âvariance approachâ to estimating microfibril angle (MFA). The first concern is associated with the approximation of the variance of an X-ray intensity half-profile by a function of the MFA and the natural variability of the MFA. The second concern is associated with the approximation...
Steve P. Verrill; David E. Kretschmann; Victoria L. Herian; Michael Wiemann; Harry A. Alden
2010-01-01
In this paper we raise three technical concerns about Evansâs 1999 Appita Journal âvariance approachâ to estimating microfibril angle. The first concern is associated with the approximation of the variance of an X-ray intensity half-profile by a function of the microfibril angle and the natural variability of the microfibril angle, S2...
Dai, James Y.; Hughes, James P.
2012-01-01
The meta-analytic approach to evaluating surrogate end points assesses the predictiveness of treatment effect on the surrogate toward treatment effect on the clinical end point based on multiple clinical trials. Definition and estimation of the correlation of treatment effects were developed in linear mixed models and later extended to binary or failure time outcomes on a case-by-case basis. In a general regression setting that covers nonnormal outcomes, we discuss in this paper several metrics that are useful in the meta-analytic evaluation of surrogacy. We propose a unified 3-step procedure to assess these metrics in settings with binary end points, time-to-event outcomes, or repeated measures. First, the joint distribution of estimated treatment effects is ascertained by an estimating equation approach; second, the restricted maximum likelihood method is used to estimate the means and the variance components of the random treatment effects; finally, confidence intervals are constructed by a parametric bootstrap procedure. The proposed method is evaluated by simulations and applications to 2 clinical trials. PMID:22394448
Estimation and Partitioning of Heritability in Human Populations using Whole Genome Analysis Methods
Vinkhuyzen, Anna AE; Wray, Naomi R; Yang, Jian; Goddard, Michael E; Visscher, Peter M
2014-01-01
Understanding genetic variation of complex traits in human populations has moved from the quantification of the resemblance between close relatives to the dissection of genetic variation into the contributions of individual genomic loci. But major questions remain unanswered: how much phenotypic variation is genetic, how much of the genetic variation is additive and what is the joint distribution of effect size and allele frequency at causal variants? We review and compare three whole-genome analysis methods that use mixed linear models (MLM) to estimate genetic variation, using the relationship between close or distant relatives based on pedigree or SNPs. We discuss theory, estimation procedures, bias and precision of each method and review recent advances in the dissection of additive genetic variation of complex traits in human populations that are based upon the application of MLM. Using genome wide data, SNPs account for far more of the genetic variation than the highly significant SNPs associated with a trait, but they do not account for all of the genetic variance estimated by pedigree based methods. We explain possible reasons for this ‘missing’ heritability. PMID:23988118
Villandré, Luc; Hutcheon, Jennifer A; Perez Trejo, Maria Esther; Abenhaim, Haim; Jacobsen, Geir; Platt, Robert W
2011-01-01
We present a model for longitudinal measures of fetal weight as a function of gestational age. We use a linear mixed model, with a Box-Cox transformation of fetal weight values, and restricted cubic splines, in order to flexibly but parsimoniously model median fetal weight. We systematically compare our model to other proposed approaches. All proposed methods are shown to yield similar median estimates, as evidenced by overlapping pointwise confidence bands, except after 40 completed weeks, where our method seems to produce estimates more consistent with observed data. Sex-based stratification affects the estimates of the random effects variance-covariance structure, without significantly changing sex-specific fitted median values. We illustrate the benefits of including sex-gestational age interaction terms in the model over stratification. The comparison leads to the conclusion that the selection of a model for fetal weight for gestational age can be based on the specific goals and configuration of a given study without affecting the precision or value of median estimates for most gestational ages of interest. PMID:21931571
Supernovae as probes of cosmic parameters: estimating the bias from under-dense lines of sight
DOE Office of Scientific and Technical Information (OSTI.GOV)
Busti, V.C.; Clarkson, C.; Holanda, R.F.L., E-mail: vinicius.busti@uct.ac.za, E-mail: holanda@uepb.edu.br, E-mail: chris.clarkson@uct.ac.za
2013-11-01
Correctly interpreting observations of sources such as type Ia supernovae (SNe Ia) require knowledge of the power spectrum of matter on AU scales — which is very hard to model accurately. Because under-dense regions account for much of the volume of the universe, light from a typical source probes a mean density significantly below the cosmic mean. The relative sparsity of sources implies that there could be a significant bias when inferring distances of SNe Ia, and consequently a bias in cosmological parameter estimation. While the weak lensing approximation should in principle give the correct prediction for this, linear perturbationmore » theory predicts an effectively infinite variance in the convergence for ultra-narrow beams. We attempt to quantify the effect typically under-dense lines of sight might have in parameter estimation by considering three alternative methods for estimating distances, in addition to the usual weak lensing approximation. We find in each case this not only increases the errors in the inferred density parameters, but also introduces a bias in the posterior value.« less
Gene–Environment Correlation: Difficulties and a Natural Experiment–Based Strategy
Li, Jiang; Liu, Hexuan; Guo, Guang
2013-01-01
Objectives. We explored how gene–environment correlations can result in endogenous models, how natural experiments can protect against this threat, and if unbiased estimates from natural experiments are generalizable to other contexts. Methods. We compared a natural experiment, the College Roommate Study, which measured genes and behaviors of college students and their randomly assigned roommates in a southern public university, with observational data from the National Longitudinal Study of Adolescent Health in 2008. We predicted exposure to exercising peers using genetic markers and estimated environmental effects on alcohol consumption. A mixed-linear model estimated an alcohol consumption variance that was attributable to genetic markers and across peer environments. Results. Peer exercise environment was associated with respondent genotype in observational data, but not in the natural experiment. The effects of peer drinking and presence of a general gene–environment interaction were similar between data sets. Conclusions. Natural experiments, like random roommate assignment, could protect against potential bias introduced by gene–environment correlations. When combined with representative observational data, unbiased and generalizable causal effects could be estimated. PMID:23927502
Statistical analysis of the calibration procedure for personnel radiation measurement instruments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bush, W.J.; Bengston, S.J.; Kalbeitzer, F.L.
1980-11-01
Thermoluminescent analyzer (TLA) calibration procedures were used to estimate personnel radiation exposure levels at the Idaho National Engineering Laboratory (INEL). A statistical analysis is presented herein based on data collected over a six month period in 1979 on four TLA's located in the Department of Energy (DOE) Radiological and Environmental Sciences Laboratory at the INEL. The data were collected according to the day-to-day procedure in effect at that time. Both gamma and beta radiation models are developed. Observed TLA readings of thermoluminescent dosimeters are correlated with known radiation levels. This correlation is then used to predict unknown radiation doses frommore » future analyzer readings of personnel thermoluminescent dosimeters. The statistical techniques applied in this analysis include weighted linear regression, estimation of systematic and random error variances, prediction interval estimation using Scheffe's theory of calibration, the estimation of the ratio of the means of two normal bivariate distributed random variables and their corresponding confidence limits according to Kendall and Stuart, tests of normality, experimental design, a comparison between instruments, and quality control.« less
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, A.; Künsch, H. R.; Schwierz, C.; Stahel, W. A.
2012-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outlying observations may results from errors (e.g. in data transcription) or from local perturbations in the processes that are responsible for a given pattern of spatial variation. As an example, the spatial distribution of some trace metal in the soils of a region may be distorted by emissions of local anthropogenic sources. Outliers affect the modelling of the large-scale spatial variation, the so-called external drift or trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) [2] proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) [1] for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation. Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and unsampled locations and kriging variances. The method has been implemented in an R package. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis of the Tarrawarra soil moisture data set [3].
Athanasopoulos, Leonidas V; Dritsas, Athanasios; Doll, Helen A; Cokkinos, Dennis V
2010-08-01
This study was conducted to explain the variance in quality of life (QoL) and activity capacity of patients with congestive heart failure from pathophysiological changes as estimated by laboratory data. Peak oxygen consumption (peak VO2) and ventilation (VE)/carbon dioxide output (VCO2) slope derived from cardiopulmonary exercise testing, plasma N-terminal prohormone of B-type natriuretic peptide (NT-proBNP), and echocardiographic markers [left atrium (LA), left ventricular ejection fraction (LVEF)] were measured in 62 patients with congestive heart failure, who also completed the Minnesota Living with Heart Failure Questionnaire and the Specific Activity Questionnaire. All regression models were adjusted for age and sex. On linear regression analysis, peak VO2 with P value less than 0.001, VE/VCO2 slope with P value less than 0.01, LVEF with P value less than 0.001, LA with P=0.001, and logNT-proBNP with P value less than 0.01 were found to be associated with QoL. On stepwise multiple linear regression, peak VO2 and LVEF continued to be predictive, accounting for 40% of the variability in Minnesota Living with Heart Failure Questionnaire score. On linear regression analysis, peak VO2 with P value less than 0.001, VE/VCO2 slope with P value less than 0.001, LVEF with P value less than 0.05, LA with P value less than 0.001, and logNT-proBNP with P value less than 0.001 were found to be associated with activity capacity. On stepwise multiple linear regression, peak VO2 and LA continued to be predictive, accounting for 53% of the variability in Specific Activity Questionnaire score. Peak VO2 is independently associated both with QoL and activity capacity. In addition to peak VO2, LVEF is independently associated with QoL, and LA with activity capacity.
Comments Regarding the Binary Power Law for Heterogeneity of Disease Incidence
USDA-ARS?s Scientific Manuscript database
The binary power law (BPL) has been successfully used to characterize heterogeneity (over dispersion or small-scale aggregation) of disease incidence for many plant pathosystems. With the BPL, the log of the observed variance is a linear function of the log of the theoretical variance for a binomial...
Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders
2006-03-13
Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.
Smoothed Spectra, Ogives, and Error Estimates for Atmospheric Turbulence Data
NASA Astrophysics Data System (ADS)
Dias, Nelson Luís
2018-01-01
A systematic evaluation is conducted of the smoothed spectrum, which is a spectral estimate obtained by averaging over a window of contiguous frequencies. The technique is extended to the ogive, as well as to the cross-spectrum. It is shown that, combined with existing variance estimates for the periodogram, the variance—and therefore the random error—associated with these estimates can be calculated in a straightforward way. The smoothed spectra and ogives are biased estimates; with simple power-law analytical models, correction procedures are devised, as well as a global constraint that enforces Parseval's identity. Several new results are thus obtained: (1) The analytical variance estimates compare well with the sample variance calculated for the Bartlett spectrum and the variance of the inertial subrange of the cospectrum is shown to be relatively much larger than that of the spectrum. (2) Ogives and spectra estimates with reduced bias are calculated. (3) The bias of the smoothed spectrum and ogive is shown to be negligible at the higher frequencies. (4) The ogives and spectra thus calculated have better frequency resolution than the Bartlett spectrum, with (5) gradually increasing variance and relative error towards the low frequencies. (6) Power-law identification and extraction of the rate of dissipation of turbulence kinetic energy are possible directly from the ogive. (7) The smoothed cross-spectrum is a valid inner product and therefore an acceptable candidate for coherence and spectral correlation coefficient estimation by means of the Cauchy-Schwarz inequality. The quadrature, phase function, coherence function and spectral correlation function obtained from the smoothed spectral estimates compare well with the classical ones derived from the Bartlett spectrum.
Wang, Yuanjia; Chen, Huaihou
2012-01-01
Summary We examine a generalized F-test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying-coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two-way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F-test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 108 simulations) and asymptotic approximation may be unreliable and conservative. PMID:23020801
Wang, Yuanjia; Chen, Huaihou
2012-12-01
We examine a generalized F-test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying-coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two-way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F-test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 10(8) simulations) and asymptotic approximation may be unreliable and conservative. © 2012, The International Biometric Society.
Minimum variance geographic sampling
NASA Technical Reports Server (NTRS)
Terrell, G. R. (Principal Investigator)
1980-01-01
Resource inventories require samples with geographical scatter, sometimes not as widely spaced as would be hoped. A simple model of correlation over distances is used to create a minimum variance unbiased estimate population means. The fitting procedure is illustrated from data used to estimate Missouri corn acreage.
Bignardi, A B; El Faro, L; Rosa, G J M; Cardoso, V L; Machado, P F; Albuquerque, L G
2012-04-01
A total of 46,089 individual monthly test-day (TD) milk yields (10 test-days), from 7,331 complete first lactations of Holstein cattle were analyzed. A standard multivariate analysis (MV), reduced rank analyses fitting the first 2, 3, and 4 genetic principal components (PC2, PC3, PC4), and analyses that fitted a factor analytic structure considering 2, 3, and 4 factors (FAS2, FAS3, FAS4), were carried out. The models included the random animal genetic effect and fixed effects of the contemporary groups (herd-year-month of test-day), age of cow (linear and quadratic effects), and days in milk (linear effect). The residual covariance matrix was assumed to have full rank. Moreover, 2 random regression models were applied. Variance components were estimated by restricted maximum likelihood method. The heritability estimates ranged from 0.11 to 0.24. The genetic correlation estimates between TD obtained with the PC2 model were higher than those obtained with the MV model, especially on adjacent test-days at the end of lactation close to unity. The results indicate that for the data considered in this study, only 2 principal components are required to summarize the bulk of genetic variation among the 10 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Detection of gene–environment interaction in pedigree data using genome-wide genotypes
Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V
2016-01-01
Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits. PMID:27436263
Bureau, Alexandre; Duchesne, Thierry
2015-12-01
Splitting extended families into their component nuclear families to apply a genetic association method designed for nuclear families is a widespread practice in familial genetic studies. Dependence among genotypes and phenotypes of nuclear families from the same extended family arises because of genetic linkage of the tested marker with a risk variant or because of familial specificity of genetic effects due to gene-environment interaction. This raises concerns about the validity of inference conducted under the assumption of independence of the nuclear families. We indeed prove theoretically that, in a conditional logistic regression analysis applicable to disease cases and their genotyped parents, the naive model-based estimator of the variance of the coefficient estimates underestimates the true variance. However, simulations with realistic effect sizes of risk variants and variation of this effect from family to family reveal that the underestimation is negligible. The simulations also show the greater efficiency of the model-based variance estimator compared to a robust empirical estimator. Our recommendation is therefore, to use the model-based estimator of variance for inference on effects of genetic variants.
Knopman, Debra S.; Voss, Clifford I.
1987-01-01
The spatial and temporal variability of sensitivities has a significant impact on parameter estimation and sampling design for studies of solute transport in porous media. Physical insight into the behavior of sensitivities is offered through an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation. When parameters are estimated in regression models of one-dimensional transport, the spatial and temporal variability in sensitivities influences variance and covariance of parameter estimates. Several principles account for the observed influence of sensitivities on parameter uncertainty. (1) Information about a physical parameter may be most accurately gained at points in space and time with a high sensitivity to the parameter. (2) As the distance of observation points from the upstream boundary increases, maximum sensitivity to velocity during passage of the solute front increases and the consequent estimate of velocity tends to have lower variance. (3) The frequency of sampling must be “in phase” with the S shape of the dispersion sensitivity curve to yield the most information on dispersion. (4) The sensitivity to the dispersion coefficient is usually at least an order of magnitude less than the sensitivity to velocity. (5) The assumed probability distribution of random error in observations of solute concentration determines the form of the sensitivities. (6) If variance in random error in observations is large, trends in sensitivities of observation points may be obscured by noise and thus have limited value in predicting variance in parameter estimates among designs. (7) Designs that minimize the variance of one parameter may not necessarily minimize the variance of other parameters. (8) The time and space interval over which an observation point is sensitive to a given parameter depends on the actual values of the parameters in the underlying physical system.
Optimal control of LQG problem with an explicit trade-off between mean and variance
NASA Astrophysics Data System (ADS)
Qian, Fucai; Xie, Guo; Liu, Ding; Xie, Wenfang
2011-12-01
For discrete-time linear-quadratic Gaussian (LQG) control problems, a utility function on the expectation and the variance of the conventional performance index is considered. The utility function is viewed as an overall objective of the system and can perform the optimal trade-off between the mean and the variance of performance index. The nonlinear utility function is first converted into an auxiliary parameters optimisation problem about the expectation and the variance. Then an optimal closed-loop feedback controller for the nonseparable mean-variance minimisation problem is designed by nonlinear mathematical programming. Finally, simulation results are given to verify the algorithm's effectiveness obtained in this article.
An improved method for bivariate meta-analysis when within-study correlations are unknown.
Hong, Chuan; D Riley, Richard; Chen, Yong
2018-03-01
Multivariate meta-analysis, which jointly analyzes multiple and possibly correlated outcomes in a single analysis, is becoming increasingly popular in recent years. An attractive feature of the multivariate meta-analysis is its ability to account for the dependence between multiple estimates from the same study. However, standard inference procedures for multivariate meta-analysis require the knowledge of within-study correlations, which are usually unavailable. This limits standard inference approaches in practice. Riley et al proposed a working model and an overall synthesis correlation parameter to account for the marginal correlation between outcomes, where the only data needed are those required for a separate univariate random-effects meta-analysis. As within-study correlations are not required, the Riley method is applicable to a wide variety of evidence synthesis situations. However, the standard variance estimator of the Riley method is not entirely correct under many important settings. As a consequence, the coverage of a function of pooled estimates may not reach the nominal level even when the number of studies in the multivariate meta-analysis is large. In this paper, we improve the Riley method by proposing a robust variance estimator, which is asymptotically correct even when the model is misspecified (ie, when the likelihood function is incorrect). Simulation studies of a bivariate meta-analysis, in a variety of settings, show a function of pooled estimates has improved performance when using the proposed robust variance estimator. In terms of individual pooled estimates themselves, the standard variance estimator and robust variance estimator give similar results to the original method, with appropriate coverage. The proposed robust variance estimator performs well when the number of studies is relatively large. Therefore, we recommend the use of the robust method for meta-analyses with a relatively large number of studies (eg, m≥50). When the sample size is relatively small, we recommend the use of the robust method under the working independence assumption. We illustrate the proposed method through 2 meta-analyses. Copyright © 2017 John Wiley & Sons, Ltd.
Meteorological adjustment of yearly mean values for air pollutant concentration comparison
NASA Technical Reports Server (NTRS)
Sidik, S. M.; Neustadter, H. E.
1976-01-01
Using multiple linear regression analysis, models which estimate mean concentrations of Total Suspended Particulate (TSP), sulfur dioxide, and nitrogen dioxide as a function of several meteorologic variables, two rough economic indicators, and a simple trend in time are studied. Meteorologic data were obtained and do not include inversion heights. The goodness of fit of the estimated models is partially reflected by the squared coefficient of multiple correlation which indicates that, at the various sampling stations, the models accounted for about 23 to 47 percent of the total variance of the observed TSP concentrations. If the resulting model equations are used in place of simple overall means of the observed concentrations, there is about a 20 percent improvement in either: (1) predicting mean concentrations for specified meteorological conditions; or (2) adjusting successive yearly averages to allow for comparisons devoid of meteorological effects. An application to source identification is presented using regression coefficients of wind velocity predictor variables.
Wildhaber, M.L.; Holan, S.H.; Bryan, J.L.; Gladish, D.W.; Ellersieck, M.
2011-01-01
In 2003, the US Army Corps of Engineers initiated the Pallid Sturgeon Population Assessment Program (PSPAP) to monitor pallid sturgeon and the fish community of the Missouri River. The power analysis of PSPAP presented here was conducted to guide sampling design and effort decisions. The PSPAP sampling design has a nested structure with multiple gear subsamples within a river bend. Power analyses were based on a normal linear mixed model, using a mixed cell means approach, with variance estimates from the original data. It was found that, at current effort levels, at least 20 years for pallid and 10 years for shovelnose sturgeon is needed to detect a 5% annual decline. Modified bootstrap simulations suggest power estimates from the original data are conservative due to excessive zero fish counts. In general, the approach presented is applicable to a wide array of animal monitoring programs.
Inverse sequential procedures for the monitoring of time series
NASA Technical Reports Server (NTRS)
Radok, Uwe; Brown, Timothy
1993-01-01
Climate changes traditionally have been detected from long series of observations and long after they happened. The 'inverse sequential' monitoring procedure is designed to detect changes as soon as they occur. Frequency distribution parameters are estimated both from the most recent existing set of observations and from the same set augmented by 1,2,...j new observations. Individual-value probability products ('likelihoods') are then calculated which yield probabilities for erroneously accepting the existing parameter(s) as valid for the augmented data set and vice versa. A parameter change is signaled when these probabilities (or a more convenient and robust compound 'no change' probability) show a progressive decrease. New parameters are then estimated from the new observations alone to restart the procedure. The detailed algebra is developed and tested for Gaussian means and variances, Poisson and chi-square means, and linear or exponential trends; a comprehensive and interactive Fortran program is provided in the appendix.
Physical activity measurement in older adults: relationships with mental health.
Parker, Sarah J; Strath, Scott J; Swartz, Ann M
2008-10-01
This study examined the relationship between physical activity (PA) and mental health among older adults as measured by objective and subjective PA-assessment instruments. Pedometers (PED), accelerometers (ACC), and the Physical Activity Scale for the Elderly (PASE) were administered to measure 1 week of PA among 84 adults age 55-87 (mean = 71) years. General mental health was measured using the Positive and Negative Affect Scale (PANAS) and the Satisfaction With Life Scale (SWL). Linear regressions revealed that PA estimated by PED significantly predicted 18.1%, 8.3%, and 12.3% of variance in SWL and positive and negative affect, respectively, whereas PA estimated by the PASE did not predict any mental health variables. Results from ACC data were mixed. Hotelling-William tests between correlation coefficients revealed that the relationship between PED and SWL was significantly stronger than the relationship between PASE and SWL. Relationships between PA and mental health might depend on the PA measure used.
Robert B. Thomas; Jack Lewis
1993-01-01
Time-stratified sampling of sediment for estimating suspended load is introduced and compared to selection at list time (SALT) sampling. Both methods provide unbiased estimates of load and variance. The magnitude of the variance of the two methods is compared using five storm populations of suspended sediment flux derived from turbidity data. Under like conditions,...
Estimation of the biserial correlation and its sampling variance for use in meta-analysis.
Jacobs, Perke; Viechtbauer, Wolfgang
2017-06-01
Meta-analyses are often used to synthesize the findings of studies examining the correlational relationship between two continuous variables. When only dichotomous measurements are available for one of the two variables, the biserial correlation coefficient can be used to estimate the product-moment correlation between the two underlying continuous variables. Unlike the point-biserial correlation coefficient, biserial correlation coefficients can therefore be integrated with product-moment correlation coefficients in the same meta-analysis. The present article describes the estimation of the biserial correlation coefficient for meta-analytic purposes and reports simulation results comparing different methods for estimating the coefficient's sampling variance. The findings indicate that commonly employed methods yield inconsistent estimates of the sampling variance across a broad range of research situations. In contrast, consistent estimates can be obtained using two methods that appear to be unknown in the meta-analytic literature. A variance-stabilizing transformation for the biserial correlation coefficient is described that allows for the construction of confidence intervals for individual coefficients with close to nominal coverage probabilities in most of the examined conditions. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.
Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C
2014-01-01
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number
Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.
2014-01-01
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470
LeDell, Erin; Petersen, Maya; van der Laan, Mark
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
Petersen, Maya; van der Laan, Mark
2015-01-01
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC. PMID:26279737
Techniques for the Enhancement of Linear Predictive Speech Coding in Adverse Conditions
NASA Astrophysics Data System (ADS)
Wrench, Alan A.
Available from UMI in association with The British Library. Requires signed TDF. The Linear Prediction model was first applied to speech two and a half decades ago. Since then it has been the subject of intense research and continues to be one of the principal tools in the analysis of speech. Its mathematical tractability makes it a suitable subject for study and its proven success in practical applications makes the study worthwhile. The model is known to be unsuited to speech corrupted by background noise. This has led many researchers to investigate ways of enhancing the speech signal prior to Linear Predictive analysis. In this thesis this body of work is extended. The chosen application is low bit-rate (2.4 kbits/sec) speech coding. For this task the performance of the Linear Prediction algorithm is crucial because there is insufficient bandwidth to encode the error between the modelled speech and the original input. A review of the fundamentals of Linear Prediction and an independent assessment of the relative performance of methods of Linear Prediction modelling are presented. A new method is proposed which is fast and facilitates stability checking, however, its stability is shown to be unacceptably poorer than existing methods. A novel supposition governing the positioning of the analysis frame relative to a voiced speech signal is proposed and supported by observation. The problem of coding noisy speech is examined. Four frequency domain speech processing techniques are developed and tested. These are: (i) Combined Order Linear Prediction Spectral Estimation; (ii) Frequency Scaling According to an Aural Model; (iii) Amplitude Weighting Based on Perceived Loudness; (iv) Power Spectrum Squaring. These methods are compared with the Recursive Linearised Maximum a Posteriori method. Following on from work done in the frequency domain, a time domain implementation of spectrum squaring is developed. In addition, a new method of power spectrum estimation is developed based on the Minimum Variance approach. This new algorithm is shown to be closely related to Linear Prediction but produces slightly broader spectral peaks. Spectrum squaring is applied to both the new algorithm and standard Linear Prediction and their relative performance is assessed. (Abstract shortened by UMI.).
Overlap between treatment and control distributions as an effect size measure in experiments.
Hedges, Larry V; Olkin, Ingram
2016-03-01
The proportion π of treatment group observations that exceed the control group mean has been proposed as an effect size measure for experiments that randomly assign independent units into 2 groups. We give the exact distribution of a simple estimator of π based on the standardized mean difference and use it to study the small sample bias of this estimator. We also give the minimum variance unbiased estimator of π under 2 models, one in which the variance of the mean difference is known and one in which the variance is unknown. We show how to use the relation between the standardized mean difference and the overlap measure to compute confidence intervals for π and show that these results can be used to obtain unbiased estimators, large sample variances, and confidence intervals for 3 related effect size measures based on the overlap. Finally, we show how the effect size π can be used in a meta-analysis. (c) 2016 APA, all rights reserved).
Technical Note: Introduction of variance component analysis to setup error analysis in radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matsuo, Yukinori, E-mail: ymatsuo@kuhp.kyoto-u.ac.
Purpose: The purpose of this technical note is to introduce variance component analysis to the estimation of systematic and random components in setup error of radiotherapy. Methods: Balanced data according to the one-factor random effect model were assumed. Results: Analysis-of-variance (ANOVA)-based computation was applied to estimate the values and their confidence intervals (CIs) for systematic and random errors and the population mean of setup errors. The conventional method overestimates systematic error, especially in hypofractionated settings. The CI for systematic error becomes much wider than that for random error. The ANOVA-based estimation can be extended to a multifactor model considering multiplemore » causes of setup errors (e.g., interpatient, interfraction, and intrafraction). Conclusions: Variance component analysis may lead to novel applications to setup error analysis in radiotherapy.« less