Mixed model approaches for diallel analysis based on a bio-model.
Zhu, J; Weir, B S
1996-12-01
A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.
Diallel analysis for sex-linked and maternal effects.
Zhu, J; Weir, B S
1996-01-01
Genetic models including sex-linked and maternal effects as well as autosomal gene effects are described. Monte Carlo simulations were conducted to compare efficiencies of estimation by minimum norm quadratic unbiased estimation (MINQUE) and restricted maximum likelihood (REML) methods. MINQUE(1), which has 1 for all prior values, has a similar efficiency to MINQUE(θ), which requires prior estimates of parameter values. MINQUE(1) has the advantage over REML of unbiased estimation and convenient computation. An adjusted unbiased prediction (AUP) method is developed for predicting random genetic effects. AUP is desirable for its easy computation and unbiasedness of both mean and variance of predictors. The jackknife procedure is appropriate for estimating the sampling variances of estimated variances (or covariances) and of predicted genetic effects. A t-test based on jackknife variances is applicable for detecting significance of variation. Worked examples from mice and silkworm data are given in order to demonstrate variance and covariance estimation and genetic effect prediction.
Genetic basis of between-individual and within-individual variance of docility.
Martin, J G A; Pirotta, E; Petelle, M B; Blumstein, D T
2017-04-01
Between-individual variation in phenotypes within a population is the basis of evolution. However, evolutionary and behavioural ecologists have mainly focused on estimating between-individual variance in mean trait and neglected variation in within-individual variance, or predictability of a trait. In fact, an important assumption of mixed-effects models used to estimate between-individual variance in mean traits is that within-individual residual variance (predictability) is identical across individuals. Individual heterogeneity in the predictability of behaviours is a potentially important effect but rarely estimated and accounted for. We used 11 389 measures of docility behaviour from 1576 yellow-bellied marmots (Marmota flaviventris) to estimate between-individual variation in both mean docility and its predictability. We then implemented a double hierarchical animal model to decompose the variances of both mean trait and predictability into their environmental and genetic components. We found that individuals differed both in their docility and in their predictability of docility with a negative phenotypic covariance. We also found significant genetic variance for both mean docility and its predictability but no genetic covariance between the two. This analysis is one of the first to estimate the genetic basis of both mean trait and within-individual variance in a wild population. Our results indicate that equal within-individual variance should not be assumed. We demonstrate the evolutionary importance of the variation in the predictability of docility and illustrate potential bias in models ignoring variation in predictability. We conclude that the variability in the predictability of a trait should not be ignored, and present a coherent approach for its quantification. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Holmes, John B; Dodds, Ken G; Lee, Michael A
2017-03-02
An important issue in genetic evaluation is the comparability of random effects (breeding values), particularly between pairs of animals in different contemporary groups. This is usually referred to as genetic connectedness. While various measures of connectedness have been proposed in the literature, there is general agreement that the most appropriate measure is some function of the prediction error variance-covariance matrix. However, obtaining the prediction error variance-covariance matrix is computationally demanding for large-scale genetic evaluations. Many alternative statistics have been proposed that avoid the computational cost of obtaining the prediction error variance-covariance matrix, such as counts of genetic links between contemporary groups, gene flow matrices, and functions of the variance-covariance matrix of estimated contemporary group fixed effects. In this paper, we show that a correction to the variance-covariance matrix of estimated contemporary group fixed effects will produce the exact prediction error variance-covariance matrix averaged by contemporary group for univariate models in the presence of single or multiple fixed effects and one random effect. We demonstrate the correction for a series of models and show that approximations to the prediction error matrix based solely on the variance-covariance matrix of estimated contemporary group fixed effects are inappropriate in certain circumstances. Our method allows for the calculation of a connectedness measure based on the prediction error variance-covariance matrix by calculating only the variance-covariance matrix of estimated fixed effects. Since the number of fixed effects in genetic evaluation is usually orders of magnitudes smaller than the number of random effect levels, the computational requirements for our method should be reduced.
A two step Bayesian approach for genomic prediction of breeding values.
Shariati, Mohammad M; Sørensen, Peter; Janss, Luc
2012-05-21
In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
Optimal design criteria - prediction vs. parameter estimation
NASA Astrophysics Data System (ADS)
Waldl, Helmut
2014-05-01
G-optimality is a popular design criterion for optimal prediction, it tries to minimize the kriging variance over the whole design region. A G-optimal design minimizes the maximum variance of all predicted values. If we use kriging methods for prediction it is self-evident to use the kriging variance as a measure of uncertainty for the estimates. Though the computation of the kriging variance and even more the computation of the empirical kriging variance is computationally very costly and finding the maximum kriging variance in high-dimensional regions can be time demanding such that we cannot really find the G-optimal design with nowadays available computer equipment in practice. We cannot always avoid this problem by using space-filling designs because small designs that minimize the empirical kriging variance are often non-space-filling. D-optimality is the design criterion related to parameter estimation. A D-optimal design maximizes the determinant of the information matrix of the estimates. D-optimality in terms of trend parameter estimation and D-optimality in terms of covariance parameter estimation yield basically different designs. The Pareto frontier of these two competing determinant criteria corresponds with designs that perform well under both criteria. Under certain conditions searching the G-optimal design on the above Pareto frontier yields almost as good results as searching the G-optimal design in the whole design region. In doing so the maximum of the empirical kriging variance has to be computed only a few times though. The method is demonstrated by means of a computer simulation experiment based on data provided by the Belgian institute Management Unit of the North Sea Mathematical Models (MUMM) that describe the evolution of inorganic and organic carbon and nutrients, phytoplankton, bacteria and zooplankton in the Southern Bight of the North Sea.
Bouvet, J-M; Makouanzi, G; Cros, D; Vigneron, Ph
2016-01-01
Hybrids are broadly used in plant breeding and accurate estimation of variance components is crucial for optimizing genetic gain. Genome-wide information may be used to explore models designed to assess the extent of additive and non-additive variance and test their prediction accuracy for the genomic selection. Ten linear mixed models, involving pedigree- and marker-based relationship matrices among parents, were developed to estimate additive (A), dominance (D) and epistatic (AA, AD and DD) effects. Five complementary models, involving the gametic phase to estimate marker-based relationships among hybrid progenies, were developed to assess the same effects. The models were compared using tree height and 3303 single-nucleotide polymorphism markers from 1130 cloned individuals obtained via controlled crosses of 13 Eucalyptus urophylla females with 9 Eucalyptus grandis males. Akaike information criterion (AIC), variance ratios, asymptotic correlation matrices of estimates, goodness-of-fit, prediction accuracy and mean square error (MSE) were used for the comparisons. The variance components and variance ratios differed according to the model. Models with a parent marker-based relationship matrix performed better than those that were pedigree-based, that is, an absence of singularities, lower AIC, higher goodness-of-fit and accuracy and smaller MSE. However, AD and DD variances were estimated with high s.es. Using the same criteria, progeny gametic phase-based models performed better in fitting the observations and predicting genetic values. However, DD variance could not be separated from the dominance variance and null estimates were obtained for AA and AD effects. This study highlighted the advantages of progeny models using genome-wide information. PMID:26328760
Direct and indirect genetic and fine-scale location effects on breeding date in song sparrows.
Germain, Ryan R; Wolak, Matthew E; Arcese, Peter; Losdat, Sylvain; Reid, Jane M
2016-11-01
Quantifying direct and indirect genetic effects of interacting females and males on variation in jointly expressed life-history traits is central to predicting microevolutionary dynamics. However, accurately estimating sex-specific additive genetic variances in such traits remains difficult in wild populations, especially if related individuals inhabit similar fine-scale environments. Breeding date is a key life-history trait that responds to environmental phenology and mediates individual and population responses to environmental change. However, no studies have estimated female (direct) and male (indirect) additive genetic and inbreeding effects on breeding date, and estimated the cross-sex genetic correlation, while simultaneously accounting for fine-scale environmental effects of breeding locations, impeding prediction of microevolutionary dynamics. We fitted animal models to 38 years of song sparrow (Melospiza melodia) phenology and pedigree data to estimate sex-specific additive genetic variances in breeding date, and the cross-sex genetic correlation, thereby estimating the total additive genetic variance while simultaneously estimating sex-specific inbreeding depression. We further fitted three forms of spatial animal model to explicitly estimate variance in breeding date attributable to breeding location, overlap among breeding locations and spatial autocorrelation. We thereby quantified fine-scale location variances in breeding date and quantified the degree to which estimating such variances affected the estimated additive genetic variances. The non-spatial animal model estimated nonzero female and male additive genetic variances in breeding date (sex-specific heritabilities: 0·07 and 0·02, respectively) and a strong, positive cross-sex genetic correlation (0·99), creating substantial total additive genetic variance (0·18). Breeding date varied with female, but not male inbreeding coefficient, revealing direct, but not indirect, inbreeding depression. All three spatial animal models estimated small location variance in breeding date, but because relatedness and breeding location were virtually uncorrelated, modelling location variance did not alter the estimated additive genetic variances. Our results show that sex-specific additive genetic effects on breeding date can be strongly positively correlated, which would affect any predicted rates of microevolutionary change in response to sexually antagonistic or congruent selection. Further, we show that inbreeding effects on breeding date can also be sex specific and that genetic effects can exceed phenotypic variation stemming from fine-scale location-based variation within a wild population. © 2016 The Authors. Journal of Animal Ecology © 2016 British Ecological Society.
LeDell, Erin; Petersen, Maya; van der Laan, Mark
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
Petersen, Maya; van der Laan, Mark
2015-01-01
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC. PMID:26279737
Effect of correlated observation error on parameters, predictions, and uncertainty
Tiedeman, Claire; Green, Christopher T.
2013-01-01
Correlations among observation errors are typically omitted when calculating observation weights for model calibration by inverse methods. We explore the effects of omitting these correlations on estimates of parameters, predictions, and uncertainties. First, we develop a new analytical expression for the difference in parameter variance estimated with and without error correlations for a simple one-parameter two-observation inverse model. Results indicate that omitting error correlations from both the weight matrix and the variance calculation can either increase or decrease the parameter variance, depending on the values of error correlation (ρ) and the ratio of dimensionless scaled sensitivities (rdss). For small ρ, the difference in variance is always small, but for large ρ, the difference varies widely depending on the sign and magnitude of rdss. Next, we consider a groundwater reactive transport model of denitrification with four parameters and correlated geochemical observation errors that are computed by an error-propagation approach that is new for hydrogeologic studies. We compare parameter estimates, predictions, and uncertainties obtained with and without the error correlations. Omitting the correlations modestly to substantially changes parameter estimates, and causes both increases and decreases of parameter variances, consistent with the analytical expression. Differences in predictions for the models calibrated with and without error correlations can be greater than parameter differences when both are considered relative to their respective confidence intervals. These results indicate that including observation error correlations in weighting for nonlinear regression can have important effects on parameter estimates, predictions, and their respective uncertainties.
Sun, Chuanyu; VanRaden, Paul M.; Cole, John B.; O'Connell, Jeffrey R.
2014-01-01
Dominance may be an important source of non-additive genetic variance for many traits of dairy cattle. However, nearly all prediction models for dairy cattle have included only additive effects because of the limited number of cows with both genotypes and phenotypes. The role of dominance in the Holstein and Jersey breeds was investigated for eight traits: milk, fat, and protein yields; productive life; daughter pregnancy rate; somatic cell score; fat percent and protein percent. Additive and dominance variance components were estimated and then used to estimate additive and dominance effects of single nucleotide polymorphisms (SNPs). The predictive abilities of three models with both additive and dominance effects and a model with additive effects only were assessed using ten-fold cross-validation. One procedure estimated dominance values, and another estimated dominance deviations; calculation of the dominance relationship matrix was different for the two methods. The third approach enlarged the dataset by including cows with genotype probabilities derived using genotyped ancestors. For yield traits, dominance variance accounted for 5 and 7% of total variance for Holsteins and Jerseys, respectively; using dominance deviations resulted in smaller dominance and larger additive variance estimates. For non-yield traits, dominance variances were very small for both breeds. For yield traits, including additive and dominance effects fit the data better than including only additive effects; average correlations between estimated genetic effects and phenotypes showed that prediction accuracy increased when both effects rather than just additive effects were included. No corresponding gains in prediction ability were found for non-yield traits. Including cows with derived genotype probabilities from genotyped ancestors did not improve prediction accuracy. The largest additive effects were located on chromosome 14 near DGAT1 for yield traits for both breeds; those SNPs also showed the largest dominance effects for fat yield (both breeds) as well as for Holstein milk yield. PMID:25084281
Variance computations for functional of absolute risk estimates.
Pfeiffer, R M; Petracci, E
2011-07-01
We present a simple influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We apply this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we use an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria are compared to bootstrap variance estimates.
Variance computations for functional of absolute risk estimates
Pfeiffer, R.M.; Petracci, E.
2011-01-01
We present a simple influence function based approach to compute the variances of estimates of absolute risk and functions of absolute risk. We apply this approach to criteria that assess the impact of changes in the risk factor distribution on absolute risk for an individual and at the population level. As an illustration we use an absolute risk prediction model for breast cancer that includes modifiable risk factors in addition to standard breast cancer risk factors. Influence function based variance estimates for absolute risk and the criteria are compared to bootstrap variance estimates. PMID:21643476
Analysis of conditional genetic effects and variance components in developmental genetics.
Zhu, J
1995-12-01
A genetic model with additive-dominance effects and genotype x environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t-1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects.
Analysis of Conditional Genetic Effects and Variance Components in Developmental Genetics
Zhu, J.
1995-01-01
A genetic model with additive-dominance effects and genotype X environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t - 1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects. PMID:8601500
Rönnegård, L; Felleki, M; Fikse, W F; Mulder, H A; Strandberg, E
2013-04-01
Trait uniformity, or micro-environmental sensitivity, may be studied through individual differences in residual variance. These differences appear to be heritable, and the need exists, therefore, to fit models to predict breeding values explaining differences in residual variance. The aim of this paper is to estimate breeding values for micro-environmental sensitivity (vEBV) in milk yield and somatic cell score, and their associated variance components, on a large dairy cattle data set having more than 1.6 million records. Estimation of variance components, ordinary breeding values, and vEBV was performed using standard variance component estimation software (ASReml), applying the methodology for double hierarchical generalized linear models. Estimation using ASReml took less than 7 d on a Linux server. The genetic standard deviations for residual variance were 0.21 and 0.22 for somatic cell score and milk yield, respectively, which indicate moderate genetic variance for residual variance and imply that a standard deviation change in vEBV for one of these traits would alter the residual variance by 20%. This study shows that estimation of variance components, estimated breeding values and vEBV, is feasible for large dairy cattle data sets using standard variance component estimation software. The possibility to select for uniformity in Holstein dairy cattle based on these estimates is discussed. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Doherty, P.F.; Schreiber, E.A.; Nichols, J.D.; Hines, J.E.; Link, W.A.; Schenk, G.A.; Schreiber, R.W.
2004-01-01
Life history theory and associated empirical generalizations predict that population growth rate (λ) in long-lived animals should be most sensitive to adult survival; the rates to which λ is most sensitive should be those with the smallest temporal variances; and stochastic environmental events should most affect the rates to which λ is least sensitive. To date, most analyses attempting to examine these predictions have been inadequate, their validity being called into question by problems in estimating parameters, problems in estimating the variability of parameters, and problems in measuring population sensitivities to parameters. We use improved methodologies in these three areas and test these life-history predictions in a population of red-tailed tropicbirds (Phaethon rubricauda). We support our first prediction that λ is most sensitive to survival rates. However the support for the second prediction that these rates have the smallest temporal variance was equivocal. Previous support for the second prediction may be an artifact of a high survival estimate near the upper boundary of 1 and not a result of natural selection canalizing variances alone. We did not support our third prediction that effects of environmental stochasticity (El Niño) would most likely be detected in vital rates to which λ was least sensitive and which are thought to have high temporal variances. Comparative data-sets on other seabirds, within and among orders, and in other locations, are needed to understand these environmental effects.
Moghaddar, N; van der Werf, J H J
2017-12-01
The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.
Bernard R. Parresol
1993-01-01
In the context of forest modeling, it is often reasonable to assume a multiplicative heteroscedastic error structure to the data. Under such circumstances ordinary least squares no longer provides minimum variance estimates of the model parameters. Through study of the error structure, a suitable error variance model can be specified and its parameters estimated. This...
Influence of outliers on accuracy estimation in genomic prediction in plant breeding.
Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter
2014-10-01
Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.
NASA Astrophysics Data System (ADS)
Behnabian, Behzad; Mashhadi Hossainali, Masoud; Malekzadeh, Ahad
2018-02-01
The cross-validation technique is a popular method to assess and improve the quality of prediction by least squares collocation (LSC). We present a formula for direct estimation of the vector of cross-validation errors (CVEs) in LSC which is much faster than element-wise CVE computation. We show that a quadratic form of CVEs follows Chi-squared distribution. Furthermore, a posteriori noise variance factor is derived by the quadratic form of CVEs. In order to detect blunders in the observations, estimated standardized CVE is proposed as the test statistic which can be applied when noise variances are known or unknown. We use LSC together with the methods proposed in this research for interpolation of crustal subsidence in the northern coast of the Gulf of Mexico. The results show that after detection and removing outliers, the root mean square (RMS) of CVEs and estimated noise standard deviation are reduced about 51 and 59%, respectively. In addition, RMS of LSC prediction error at data points and RMS of estimated noise of observations are decreased by 39 and 67%, respectively. However, RMS of LSC prediction error on a regular grid of interpolation points covering the area is only reduced about 4% which is a consequence of sparse distribution of data points for this case study. The influence of gross errors on LSC prediction results is also investigated by lower cutoff CVEs. It is indicated that after elimination of outliers, RMS of this type of errors is also reduced by 19.5% for a 5 km radius of vicinity. We propose a method using standardized CVEs for classification of dataset into three groups with presumed different noise variances. The noise variance components for each of the groups are estimated using restricted maximum-likelihood method via Fisher scoring technique. Finally, LSC assessment measures were computed for the estimated heterogeneous noise variance model and compared with those of the homogeneous model. The advantage of the proposed method is the reduction in estimated noise levels for those groups with the fewer number of noisy data points.
Non-additive genetic variation in growth, carcass and fertility traits of beef cattle.
Bolormaa, Sunduimijid; Pryce, Jennie E; Zhang, Yuandan; Reverter, Antonio; Barendse, William; Hayes, Ben J; Goddard, Michael E
2015-04-02
A better understanding of non-additive variance could lead to increased knowledge on the genetic control and physiology of quantitative traits, and to improved prediction of the genetic value and phenotype of individuals. Genome-wide panels of single nucleotide polymorphisms (SNPs) have been mainly used to map additive effects for quantitative traits, but they can also be used to investigate non-additive effects. We estimated dominance and epistatic effects of SNPs on various traits in beef cattle and the variance explained by dominance, and quantified the increase in accuracy of phenotype prediction by including dominance deviations in its estimation. Genotype data (729 068 real or imputed SNPs) and phenotypes on up to 16 traits of 10 191 individuals from Bos taurus, Bos indicus and composite breeds were used. A genome-wide association study was performed by fitting the additive and dominance effects of single SNPs. The dominance variance was estimated by fitting a dominance relationship matrix constructed from the 729 068 SNPs. The accuracy of predicted phenotypic values was evaluated by best linear unbiased prediction using the additive and dominance relationship matrices. Epistatic interactions (additive × additive) were tested between each of the 28 SNPs that are known to have additive effects on multiple traits, and each of the other remaining 729 067 SNPs. The number of significant dominance effects was greater than expected by chance and most of them were in the direction that is presumed to increase fitness and in the opposite direction to inbreeding depression. Estimates of dominance variance explained by SNPs varied widely between traits, but had large standard errors. The median dominance variance across the 16 traits was equal to 5% of the phenotypic variance. Including a dominance deviation in the prediction did not significantly increase its accuracy for any of the phenotypes. The number of additive × additive epistatic effects that were statistically significant was greater than expected by chance. Significant dominance and epistatic effects occur for growth, carcass and fertility traits in beef cattle but they are difficult to estimate precisely and including them in phenotype prediction does not increase its accuracy.
[Theory, method and application of method R on estimation of (co)variance components].
Liu, Wen-Zhong
2004-07-01
Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further.
Mauya, Ernest William; Hansen, Endre Hofstad; Gobakken, Terje; Bollandsås, Ole Martin; Malimbwi, Rogers Ernest; Næsset, Erik
2015-12-01
Airborne laser scanning (ALS) has recently emerged as a promising tool to acquire auxiliary information for improving aboveground biomass (AGB) estimation in sample-based forest inventories. Under design-based and model-assisted inferential frameworks, the estimation relies on a model that relates the auxiliary ALS metrics to AGB estimated on ground plots. The size of the field plots has been identified as one source of model uncertainty because of the so-called boundary effects which increases with decreasing plot size. Recent research in tropical forests has aimed to quantify the boundary effects on model prediction accuracy, but evidence of the consequences for the final AGB estimates is lacking. In this study we analyzed the effect of field plot size on model prediction accuracy and its implication when used in a model-assisted inferential framework. The results showed that the prediction accuracy of the model improved as the plot size increased. The adjusted R 2 increased from 0.35 to 0.74 while the relative root mean square error decreased from 63.6 to 29.2%. Indicators of boundary effects were identified and confirmed to have significant effects on the model residuals. Variance estimates of model-assisted mean AGB relative to corresponding variance estimates of pure field-based AGB, decreased with increasing plot size in the range from 200 to 3000 m 2 . The variance ratio of field-based estimates relative to model-assisted variance ranged from 1.7 to 7.7. This study showed that the relative improvement in precision of AGB estimation when increasing field-plot size, was greater for an ALS-assisted inventory compared to that of a pure field-based inventory.
Eaton, Jeffrey W.; Bao, Le
2017-01-01
Objectives The aim of the study was to propose and demonstrate an approach to allow additional nonsampling uncertainty about HIV prevalence measured at antenatal clinic sentinel surveillance (ANC-SS) in model-based inferences about trends in HIV incidence and prevalence. Design Mathematical model fitted to surveillance data with Bayesian inference. Methods We introduce a variance inflation parameter σinfl2 that accounts for the uncertainty of nonsampling errors in ANC-SS prevalence. It is additive to the sampling error variance. Three approaches are tested for estimating σinfl2 using ANC-SS and household survey data from 40 subnational regions in nine countries in sub-Saharan, as defined in UNAIDS 2016 estimates. Methods were compared using in-sample fit and out-of-sample prediction of ANC-SS data, fit to household survey prevalence data, and the computational implications. Results Introducing the additional variance parameter σinfl2 increased the error variance around ANC-SS prevalence observations by a median of 2.7 times (interquartile range 1.9–3.8). Using only sampling error in ANC-SS prevalence ( σinfl2=0), coverage of 95% prediction intervals was 69% in out-of-sample prediction tests. This increased to 90% after introducing the additional variance parameter σinfl2. The revised probabilistic model improved model fit to household survey prevalence and increased epidemic uncertainty intervals most during the early epidemic period before 2005. Estimating σinfl2 did not increase the computational cost of model fitting. Conclusions: We recommend estimating nonsampling error in ANC-SS as an additional parameter in Bayesian inference using the Estimation and Projection Package model. This approach may prove useful for incorporating other data sources such as routine prevalence from Prevention of mother-to-child transmission testing into future epidemic estimates. PMID:28296801
Generalized Variance Function Applications in Forestry
James Alegria; Charles T. Scott; Charles T. Scott
1991-01-01
Adequately predicting the sampling errors of tabular data can reduce printing costs by eliminating the need to publish separate sampling error tables. Two generalized variance functions (GVFs) found in the literature and three GVFs derived for this study were evaluated for their ability to predict the sampling error of tabular forestry estimates. The recommended GVFs...
Kumar, Satish; Molloy, Claire; Muñoz, Patricio; Daetwyler, Hans; Chagné, David; Volz, Richard
2015-01-01
The nonadditive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and nonadditive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (Malus × domestica Borkh.) phenotypes using relationship matrices constructed from genome-wide dense single nucleotide polymorphism (SNP) markers; and compare the accuracy of genomic predictions using genomic best linear unbiased prediction models with or without including nonadditive genetic effects. A set of 247 clonally replicated individuals was assessed for six fruit quality traits at two sites, and also genotyped using an Illumina 8K SNP array. Across several fruit quality traits, the additive, dominance, and epistatic effects contributed about 30%, 16%, and 19%, respectively, to the total phenotypic variance. Models ignoring nonadditive components yielded upwardly biased estimates of additive variance (heritability) for all traits in this study. The accuracy of genomic predicted genetic values (GEGV) varied from about 0.15 to 0.35 for various traits, and these were almost identical for models with or without including nonadditive effects. However, models including nonadditive genetic effects further reduced the bias of GEGV. Between-site genotypic correlations were high (>0.85) for all traits, and genotype-site interaction accounted for <10% of the phenotypic variability. The accuracy of prediction, when the validation set was present only at one site, was generally similar for both sites, and varied from about 0.50 to 0.85. The prediction accuracies were strongly influenced by trait heritability, and genetic relatedness between the training and validation families. PMID:26497141
NASA Astrophysics Data System (ADS)
Almosallam, Ibrahim A.; Jarvis, Matt J.; Roberts, Stephen J.
2016-10-01
The next generation of cosmology experiments will be required to use photometric redshifts rather than spectroscopic redshifts. Obtaining accurate and well-characterized photometric redshift distributions is therefore critical for Euclid, the Large Synoptic Survey Telescope and the Square Kilometre Array. However, determining accurate variance predictions alongside single point estimates is crucial, as they can be used to optimize the sample of galaxies for the specific experiment (e.g. weak lensing, baryon acoustic oscillations, supernovae), trading off between completeness and reliability in the galaxy sample. The various sources of uncertainty in measurements of the photometry and redshifts put a lower bound on the accuracy that any model can hope to achieve. The intrinsic uncertainty associated with estimates is often non-uniform and input-dependent, commonly known in statistics as heteroscedastic noise. However, existing approaches are susceptible to outliers and do not take into account variance induced by non-uniform data density and in most cases require manual tuning of many parameters. In this paper, we present a Bayesian machine learning approach that jointly optimizes the model with respect to both the predictive mean and variance we refer to as Gaussian processes for photometric redshifts (GPZ). The predictive variance of the model takes into account both the variance due to data density and photometric noise. Using the Sloan Digital Sky Survey (SDSS) DR12 data, we show that our approach substantially outperforms other machine learning methods for photo-z estimation and their associated variance, such as TPZ and ANNZ2. We provide a MATLAB and PYTHON implementations that are available to download at https://github.com/OxfordML/GPz.
Pare, Guillaume; Mao, Shihong; Deng, Wei Q
2016-06-08
Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.
Pare, Guillaume; Mao, Shihong; Deng, Wei Q.
2016-01-01
Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519
Genetic control of residual variance of yearling weight in Nellore beef cattle.
Iung, L H S; Neves, H H R; Mulder, H A; Carvalheiro, R
2017-04-01
There is evidence for genetic variability in residual variance of livestock traits, which offers the potential for selection for increased uniformity of production. Different statistical approaches have been employed to study this topic; however, little is known about the concordance between them. The aim of our study was to investigate the genetic heterogeneity of residual variance on yearling weight (YW; 291.15 ± 46.67) in a Nellore beef cattle population; to compare the results of the statistical approaches, the two-step approach and the double hierarchical generalized linear model (DHGLM); and to evaluate the effectiveness of power transformation to accommodate scale differences. The comparison was based on genetic parameters, accuracy of EBV for residual variance, and cross-validation to assess predictive performance of both approaches. A total of 194,628 yearling weight records from 625 sires were used in the analysis. The results supported the hypothesis of genetic heterogeneity of residual variance on YW in Nellore beef cattle and the opportunity of selection, measured through the genetic coefficient of variation of residual variance (0.10 to 0.12 for the two-step approach and 0.17 for DHGLM, using an untransformed data set). However, low estimates of genetic variance associated with positive genetic correlations between mean and residual variance (about 0.20 for two-step and 0.76 for DHGLM for an untransformed data set) limit the genetic response to selection for uniformity of production while simultaneously increasing YW itself. Moreover, large sire families are needed to obtain accurate estimates of genetic merit for residual variance, as indicated by the low heritability estimates (<0.007). Box-Cox transformation was able to decrease the dependence of the variance on the mean and decreased the estimates of genetic parameters for residual variance. The transformation reduced but did not eliminate all the genetic heterogeneity of residual variance, highlighting its presence beyond the scale effect. The DHGLM showed higher predictive ability of EBV for residual variance and therefore should be preferred over the two-step approach.
Steen Magnussen; Ronald E. McRoberts; Erkki O. Tomppo
2009-01-01
New model-based estimators of the uncertainty of pixel-level and areal k-nearest neighbour (knn) predictions of attribute Y from remotely-sensed ancillary data X are presented. Non-parametric functions predict Y from scalar 'Single Index Model' transformations of X. Variance functions generated...
Software for the grouped optimal aggregation technique
NASA Technical Reports Server (NTRS)
Brown, P. M.; Shaw, G. W. (Principal Investigator)
1982-01-01
The grouped optimal aggregation technique produces minimum variance, unbiased estimates of acreage and production for countries, zones (states), or any designated collection of acreage strata. It uses yield predictions, historical acreage information, and direct acreage estimate from satellite data. The acreage strata are grouped in such a way that the ratio model over historical acreage provides a smaller variance than if the model were applied to each individual stratum. An optimal weighting matrix based on historical acreages, provides the link between incomplete direct acreage estimates and the total, current acreage estimate.
Small Area Variance Estimation for the Siuslaw NF in Oregon and Some Results
S. Lin; D. Boes; H.T. Schreuder
2006-01-01
The results of a small area prediction study for the Siuslaw National Forest in Oregon are presented. Predictions were made for total basal area, number of trees and mortality per ha on a 0.85 mile grid using data on a 1.7 mile grid and additional ancillary information from TM. A reliable method of estimating prediction errors for individual plot predictions called the...
Model estimation of claim risk and premium for motor vehicle insurance by using Bayesian method
NASA Astrophysics Data System (ADS)
Sukono; Riaman; Lesmana, E.; Wulandari, R.; Napitupulu, H.; Supian, S.
2018-01-01
Risk models need to be estimated by the insurance company in order to predict the magnitude of the claim and determine the premiums charged to the insured. This is intended to prevent losses in the future. In this paper, we discuss the estimation of risk model claims and motor vehicle insurance premiums using Bayesian methods approach. It is assumed that the frequency of claims follow a Poisson distribution, while a number of claims assumed to follow a Gamma distribution. The estimation of parameters of the distribution of the frequency and amount of claims are made by using Bayesian methods. Furthermore, the estimator distribution of frequency and amount of claims are used to estimate the aggregate risk models as well as the value of the mean and variance. The mean and variance estimator that aggregate risk, was used to predict the premium eligible to be charged to the insured. Based on the analysis results, it is shown that the frequency of claims follow a Poisson distribution with parameter values λ is 5.827. While a number of claims follow the Gamma distribution with parameter values p is 7.922 and θ is 1.414. Therefore, the obtained values of the mean and variance of the aggregate claims respectively are IDR 32,667,489.88 and IDR 38,453,900,000,000.00. In this paper the prediction of the pure premium eligible charged to the insured is obtained, which amounting to IDR 2,722,290.82. The prediction of the claims and premiums aggregate can be used as a reference for the insurance company’s decision-making in management of reserves and premiums of motor vehicle insurance.
Estimation of lipids and lean mass of migrating sandpipers
Skagen, Susan K.; Knopf, Fritz L.; Cade, Brian S.
1993-01-01
Estimation of lean mass and lipid levels in birds involves the derivation of predictive equations that relate morphological measurements and, more recently, total body electrical conductivity (TOBEC) indices to known lean and lipid masses. Using cross-validation techniques, we evaluated the ability of several published and new predictive equations to estimate lean and lipid mass of Semipalmated Sandpipers (Calidris pusilla) and White-rumped Sandpipers (C. fuscicollis). We also tested ideas of Morton et al. (1991), who stated that current statistical approaches to TOBEC methodology misrepresent precision in estimating body fat. Three published interspecific equations using TOBEC indices predicted lean and lipid masses of our sample of birds with average errors of 8-28% and 53-155%, respectively. A new two-species equation relating lean mass and TOBEC indices revealed average errors of 4.6% and 23.2% in predicting lean and lipid mass, respectively. New intraspecific equations that estimate lipid mass directly from body mass, morphological measurements, and TOBEC indices yielded about a 13% error in lipid estimates. Body mass and morphological measurements explained a substantial portion of the variance (about 90%) in fat mass of both species. Addition of TOBEC indices improved the predictive model more for the smaller than for the larger sandpiper. TOBEC indices explained an additional 7.8% and 2.6% of the variance in fat mass and reduced the minimum breadth of prediction intervals by 0.95 g (32%) and 0.39 g (13%) for Semipalmated and White-rumped Sandpipers, respectively. The breadth of prediction intervals for models used to predict fat levels of individual birds must be considered when interpreting the resultant lipid estimates.
[Analytic methods for seed models with genotype x environment interactions].
Zhu, J
1996-01-01
Genetic models with genotype effect (G) and genotype x environment interaction effect (GE) are proposed for analyzing generation means of seed quantitative traits in crops. The total genetic effect (G) is partitioned into seed direct genetic effect (G0), cytoplasm genetic of effect (C), and maternal plant genetic effect (Gm). Seed direct genetic effect (G0) can be further partitioned into direct additive (A) and direct dominance (D) genetic components. Maternal genetic effect (Gm) can also be partitioned into maternal additive (Am) and maternal dominance (Dm) genetic components. The total genotype x environment interaction effect (GE) can also be partitioned into direct genetic by environment interaction effect (G0E), cytoplasm genetic by environment interaction effect (CE), and maternal genetic by environment interaction effect (GmE). G0E can be partitioned into direct additive by environment interaction (AE) and direct dominance by environment interaction (DE) genetic components. GmE can also be partitioned into maternal additive by environment interaction (AmE) and maternal dominance by environment interaction (DmE) genetic components. Partitions of genetic components are listed for parent, F1, F2 and backcrosses. A set of parents, their reciprocal F1 and F2 seeds is applicable for efficient analysis of seed quantitative traits. MINQUE(0/1) method can be used for estimating variance and covariance components. Unbiased estimation for covariance components between two traits can also be obtained by the MINQUE(0/1) method. Random genetic effects in seed models are predictable by the Adjusted Unbiased Prediction (AUP) approach with MINQUE(0/1) method. The jackknife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects, which can be further used in a t-test for parameter. Unbiasedness and efficiency for estimating variance components and predicting genetic effects are tested by Monte Carlo simulations.
Su, Guosheng; Christensen, Ole F.; Ostersen, Tage; Henryon, Mark; Lund, Mogens S.
2012-01-01
Non-additive genetic variation is usually ignored when genome-wide markers are used to study the genetic architecture and genomic prediction of complex traits in human, wild life, model organisms or farm animals. However, non-additive genetic effects may have an important contribution to total genetic variation of complex traits. This study presented a genomic BLUP model including additive and non-additive genetic effects, in which additive and non-additive genetic relation matrices were constructed from information of genome-wide dense single nucleotide polymorphism (SNP) markers. In addition, this study for the first time proposed a method to construct dominance relationship matrix using SNP markers and demonstrated it in detail. The proposed model was implemented to investigate the amounts of additive genetic, dominance and epistatic variations, and assessed the accuracy and unbiasedness of genomic predictions for daily gain in pigs. In the analysis of daily gain, four linear models were used: 1) a simple additive genetic model (MA), 2) a model including both additive and additive by additive epistatic genetic effects (MAE), 3) a model including both additive and dominance genetic effects (MAD), and 4) a full model including all three genetic components (MAED). Estimates of narrow-sense heritability were 0.397, 0.373, 0.379 and 0.357 for models MA, MAE, MAD and MAED, respectively. Estimated dominance variance and additive by additive epistatic variance accounted for 5.6% and 9.5% of the total phenotypic variance, respectively. Based on model MAED, the estimate of broad-sense heritability was 0.506. Reliabilities of genomic predicted breeding values for the animals without performance records were 28.5%, 28.8%, 29.2% and 29.5% for models MA, MAE, MAD and MAED, respectively. In addition, models including non-additive genetic effects improved unbiasedness of genomic predictions. PMID:23028912
Gebreyesus, Grum; Lund, Mogens S; Buitenhuis, Bart; Bovenhuis, Henk; Poulsen, Nina A; Janss, Luc G
2017-12-05
Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls. Single-nucleotide polymorphisms (SNPs), from 50K SNP arrays, were grouped into non-overlapping genome segments. A segment was defined as one SNP, or a group of 50, 100, or 200 adjacent SNPs, or one chromosome, or the whole genome. Traditional univariate and bivariate genomic best linear unbiased prediction (GBLUP) models were also run for comparison. Reliabilities were calculated through a resampling strategy and using deterministic formula. BayesAS models improved prediction reliability for most of the traits compared to GBLUP models and this gain depended on segment size and genetic architecture of the traits. The gain in prediction reliability was especially marked for the protein composition traits β-CN, κ-CN and β-LG, for which prediction reliabilities were improved by 49 percentage points on average using the MT-BayesAS model with a 100-SNP segment size compared to the bivariate GBLUP. Prediction reliabilities were highest with the BayesAS model that uses a 100-SNP segment size. The bivariate versions of our BayesAS models resulted in extra gains of up to 6% in prediction reliability compared to the univariate versions. Substantial improvement in prediction reliability was possible for most of the traits related to milk protein composition using our novel BayesAS models. Grouping adjacent SNPs into segments provided enhanced information to estimate parameters and allowing the segments to have different (co)variances helped disentangle heterogeneous (co)variances across the genome.
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding
2013-01-01
Background In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. Results The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. Conclusions The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies. PMID:24314298
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.
Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter
2013-12-06
In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.
USDA-ARS?s Scientific Manuscript database
Transformations to multiple trait mixed model equations (MME) which are intended to improve computational efficiency in best linear unbiased prediction (BLUP) and restricted maximum likelihood (REML) are described. It is shown that traits that are expected or estimated to have zero residual variance...
Da, Yang
2015-12-18
The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h - 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h - 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q - 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h - 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h - 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.
Vitezica, Zulma G; Varona, Luis; Legarra, Andres
2013-12-01
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.
Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Porth, Ilga; Chen, Charles; El-Kassaby, Yousry A.
2016-01-01
The open-pollinated (OP) family testing combines the simplest known progeny evaluation and quantitative genetics analyses as candidates’ offspring are assumed to represent independent half-sib families. The accuracy of genetic parameter estimates is often questioned as the assumption of “half-sibling” in OP families may often be violated. We compared the pedigree- vs. marker-based genetic models by analysing 22-yr height and 30-yr wood density for 214 white spruce [Picea glauca (Moench) Voss] OP families represented by 1694 individuals growing on one site in Quebec, Canada. Assuming half-sibling, the pedigree-based model was limited to estimating the additive genetic variances which, in turn, were grossly overestimated as they were confounded by very minor dominance and major additive-by-additive epistatic genetic variances. In contrast, the implemented genomic pairwise realized relationship models allowed the disentanglement of additive from all nonadditive factors through genetic variance decomposition. The marker-based models produced more realistic narrow-sense heritability estimates and, for the first time, allowed estimating the dominance and epistatic genetic variances from OP testing. In addition, the genomic models showed better prediction accuracies compared to pedigree models and were able to predict individual breeding values for new individuals from untested families, which was not possible using the pedigree-based model. Clearly, the use of marker-based relationship approach is effective in estimating the quantitative genetic parameters of complex traits even under simple and shallow pedigree structure. PMID:26801647
Deterministic theory of Monte Carlo variance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ueki, T.; Larsen, E.W.
1996-12-31
The theoretical estimation of variance in Monte Carlo transport simulations, particularly those using variance reduction techniques, is a substantially unsolved problem. In this paper, the authors describe a theory that predicts the variance in a variance reduction method proposed by Dwivedi. Dwivedi`s method combines the exponential transform with angular biasing. The key element of this theory is a new modified transport problem, containing the Monte Carlo weight w as an extra independent variable, which simulates Dwivedi`s Monte Carlo scheme. The (deterministic) solution of this modified transport problem yields an expression for the variance. The authors give computational results that validatemore » this theory.« less
Abdollahi-Arpanahi, Rostam; Morota, Gota; Valente, Bruno D; Kranis, Andreas; Rosa, Guilherme J M; Gianola, Daniel
2016-02-03
Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5' and 3' untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. All genic and non-genic regions contributed to phenotypic variation for the three traits studied. Overall, the contribution of additive genetic variance to the total genetic variance was much greater than that of dominance variance. Our results show that all genomic regions are important for the prediction of the targeted traits, and the whole-genome approach was reaffirmed as the best tool for genome-enabled prediction of quantitative traits.
Gap-filling methods to impute eddy covariance flux data by preserving variance.
NASA Astrophysics Data System (ADS)
Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.
2015-12-01
To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables of the parameters traditionally used. Then, comparisons of the predicted values from these methods and 'traditional' gap-filling methods (using 12 fixed monthly windows) will be assessed to show the scale of preserving variance. Further, this method will be applied to impute artificially created gaps for analyzing if variance is preserved.
Prediction-error variance in Bayesian model updating: a comparative study
NASA Astrophysics Data System (ADS)
Asadollahi, Parisa; Li, Jian; Huang, Yong
2017-04-01
In Bayesian model updating, the likelihood function is commonly formulated by stochastic embedding in which the maximum information entropy probability model of prediction error variances plays an important role and it is Gaussian distribution subject to the first two moments as constraints. The selection of prediction error variances can be formulated as a model class selection problem, which automatically involves a trade-off between the average data-fit of the model class and the information it extracts from the data. Therefore, it is critical for the robustness in the updating of the structural model especially in the presence of modeling errors. To date, three ways of considering prediction error variances have been seem in the literature: 1) setting constant values empirically, 2) estimating them based on the goodness-of-fit of the measured data, and 3) updating them as uncertain parameters by applying Bayes' Theorem at the model class level. In this paper, the effect of different strategies to deal with the prediction error variances on the model updating performance is investigated explicitly. A six-story shear building model with six uncertain stiffness parameters is employed as an illustrative example. Transitional Markov Chain Monte Carlo is used to draw samples of the posterior probability density function of the structure model parameters as well as the uncertain prediction variances. The different levels of modeling uncertainty and complexity are modeled through three FE models, including a true model, a model with more complexity, and a model with modeling error. Bayesian updating is performed for the three FE models considering the three aforementioned treatments of the prediction error variances. The effect of number of measurements on the model updating performance is also examined in the study. The results are compared based on model class assessment and indicate that updating the prediction error variances as uncertain parameters at the model class level produces more robust results especially when the number of measurement is small.
On the internal target model in a tracking task
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Baron, S.
1981-01-01
An optimal control model for predicting operator's dynamic responses and errors in target tracking ability is summarized. The model, which predicts asymmetry in the tracking data, is dependent on target maneuvers and trajectories. Gunners perception, decision making, control, and estimate of target positions and velocity related to crossover intervals are discussed. The model provides estimates for means, standard deviations, and variances for variables investigated and for operator estimates of future target positions and velocities.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Normative morphometric data for cerebral cortical areas over the lifetime of the adult human brain.
Potvin, Olivier; Dieumegarde, Louis; Duchesne, Simon
2017-08-01
Proper normative data of anatomical measurements of cortical regions, allowing to quantify brain abnormalities, are lacking. We developed norms for regional cortical surface areas, thicknesses, and volumes based on cross-sectional MRI scans from 2713 healthy individuals aged 18 to 94 years using 23 samples provided by 21 independent research groups. The segmentation was conducted using FreeSurfer, a widely used and freely available automated segmentation software. Models predicting regional cortical estimates of each hemisphere were produced using age, sex, estimated total intracranial volume (eTIV), scanner manufacturer, magnetic field strength, and interactions as predictors. The explained variance for the left/right cortex was 76%/76% for surface area, 43%/42% for thickness, and 80%/80% for volume. The mean explained variance for all regions was 41% for surface areas, 27% for thicknesses, and 46% for volumes. Age, sex and eTIV predicted most of the explained variance for surface areas and volumes while age was the main predictors for thicknesses. Scanner characteristics generally predicted a limited amount of variance, but this effect was stronger for thicknesses than surface areas and volumes. For new individuals, estimates of their expected surface area, thickness and volume based on their characteristics and the scanner characteristics can be obtained using the derived formulas, as well as Z score effect sizes denoting the extent of the deviation from the normative sample. Models predicting normative values were validated in independent samples of healthy adults, showing satisfactory validation R 2 . Deviations from the normative sample were measured in individuals with mild Alzheimer's disease and schizophrenia and expected patterns of deviations were observed. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Albin, Thomas J
2017-07-01
Occasionally practitioners must work with single dimensions defined as combinations (sums or differences) of percentile values, but lack information (e.g. variances) to estimate the accommodation achieved. This paper describes methods to predict accommodation proportions for such combinations of percentile values, e.g. two 90th percentile values. Kreifeldt and Nah z-score multipliers were used to estimate the proportions accommodated by combinations of percentile values of 2-15 variables; two simplified versions required less information about variance and/or correlation. The estimates were compared to actual observed proportions; for combinations of 2-15 percentile values the average absolute differences ranged between 0.5 and 1.5 percentage points. The multipliers were also used to estimate adjusted percentile values, that, when combined, estimate a desired proportion of the combined measurements. For combinations of two and three adjusted variables, the average absolute difference between predicted and observed proportions ranged between 0.5 and 3.0 percentage points. Copyright © 2017 Elsevier Ltd. All rights reserved.
Heidaritabar, M; Wolc, A; Arango, J; Zeng, J; Settar, P; Fulton, J E; O'Sullivan, N P; Bastiaansen, J W M; Fernando, R L; Garrick, D J; Dekkers, J C M
2016-10-01
Most genomic prediction studies fit only additive effects in models to estimate genomic breeding values (GEBV). However, if dominance genetic effects are an important source of variation for complex traits, accounting for them may improve the accuracy of GEBV. We investigated the effect of fitting dominance and additive effects on the accuracy of GEBV for eight egg production and quality traits in a purebred line of brown layers using pedigree or genomic information (42K single-nucleotide polymorphism (SNP) panel). Phenotypes were corrected for the effect of hatch date. Additive and dominance genetic variances were estimated using genomic-based [genomic best linear unbiased prediction (GBLUP)-REML and BayesC] and pedigree-based (PBLUP-REML) methods. Breeding values were predicted using a model that included both additive and dominance effects and a model that included only additive effects. The reference population consisted of approximately 1800 animals hatched between 2004 and 2009, while approximately 300 young animals hatched in 2010 were used for validation. Accuracy of prediction was computed as the correlation between phenotypes and estimated breeding values of the validation animals divided by the square root of the estimate of heritability in the whole population. The proportion of dominance variance to total phenotypic variance ranged from 0.03 to 0.22 with PBLUP-REML across traits, from 0 to 0.03 with GBLUP-REML and from 0.01 to 0.05 with BayesC. Accuracies of GEBV ranged from 0.28 to 0.60 across traits. Inclusion of dominance effects did not improve the accuracy of GEBV, and differences in their accuracies between genomic-based methods were small (0.01-0.05), with GBLUP-REML yielding higher prediction accuracies than BayesC for egg production, egg colour and yolk weight, while BayesC yielded higher accuracies than GBLUP-REML for the other traits. In conclusion, fitting dominance effects did not impact accuracy of genomic prediction of breeding values in this population. © 2016 Blackwell Verlag GmbH.
Sequential causal inference: Application to randomized trials of adaptive treatment strategies
Dawson, Ree; Lavori, Philip W.
2009-01-01
SUMMARY Clinical trials that randomize subjects to decision algorithms, which adapt treatments over time according to individual response, have gained considerable interest as investigators seek designs that directly inform clinical decision making. We consider designs in which subjects are randomized sequentially at decision points, among adaptive treatment options under evaluation. We present a sequential method to estimate the comparative effects of the randomized adaptive treatments, which are formalized as adaptive treatment strategies. Our causal estimators are derived using Bayesian predictive inference. We use analytical and empirical calculations to compare the predictive estimators to (i) the ‘standard’ approach that allocates the sequentially obtained data to separate strategy-specific groups as would arise from randomizing subjects at baseline; (ii) the semi-parametric approach of marginal mean models that, under appropriate experimental conditions, provides the same sequential estimator of causal differences as the proposed approach. Simulation studies demonstrate that sequential causal inference offers substantial efficiency gains over the standard approach to comparing treatments, because the predictive estimators can take advantage of the monotone structure of shared data among adaptive strategies. We further demonstrate that the semi-parametric asymptotic variances, which are marginal ‘one-step’ estimators, may exhibit significant bias, in contrast to the predictive variances. We show that the conditions under which the sequential method is attractive relative to the other two approaches are those most likely to occur in real studies. PMID:17914714
Thermospheric mass density model error variance as a function of time scale
NASA Astrophysics Data System (ADS)
Emmert, J. T.; Sutton, E. K.
2017-12-01
In the increasingly crowded low-Earth orbit environment, accurate estimation of orbit prediction uncertainties is essential for collision avoidance. Poor characterization of such uncertainty can result in unnecessary and costly avoidance maneuvers (false positives) or disregard of a collision risk (false negatives). Atmospheric drag is a major source of orbit prediction uncertainty, and is particularly challenging to account for because it exerts a cumulative influence on orbital trajectories and is therefore not amenable to representation by a single uncertainty parameter. To address this challenge, we examine the variance of measured accelerometer-derived and orbit-derived mass densities with respect to predictions by thermospheric empirical models, using the data-minus-model variance as a proxy for model uncertainty. Our analysis focuses mainly on the power spectrum of the residuals, and we construct an empirical model of the variance as a function of time scale (from 1 hour to 10 years), altitude, and solar activity. We find that the power spectral density approximately follows a power-law process but with an enhancement near the 27-day solar rotation period. The residual variance increases monotonically with altitude between 250 and 550 km. There are two components to the variance dependence on solar activity: one component is 180 degrees out of phase (largest variance at solar minimum), and the other component lags 2 years behind solar maximum (largest variance in the descending phase of the solar cycle).
Properties of the endogenous post-stratified estimator using a random forests model
John Tipton; Jean Opsomer; Gretchen G. Moisen
2012-01-01
Post-stratification is used in survey statistics as a method to improve variance estimates. In traditional post-stratification methods, the variable on which the data is being stratified must be known at the population level. In many cases this is not possible, but it is possible to use a model to predict values using covariates, and then stratify on these predicted...
Minimum number of measurements for evaluating soursop (Annona muricata L.) yield.
Sánchez, C F B; Teodoro, P E; Londoño, S; Silva, L A; Peixoto, L A; Bhering, L L
2017-05-31
Repeatability studies on fruit species are of great importance to identify the minimum number of measurements necessary to accurately select superior genotypes. This study aimed to identify the most efficient method to estimate the repeatability coefficient (r) and predict the minimum number of measurements needed for a more accurate evaluation of soursop (Annona muricata L.) genotypes based on fruit yield. Sixteen measurements of fruit yield from 71 soursop genotypes were carried out between 2000 and 2016. In order to estimate r with the best accuracy, four procedures were used: analysis of variance, principal component analysis based on the correlation matrix, principal component analysis based on the phenotypic variance and covariance matrix, and structural analysis based on the correlation matrix. The minimum number of measurements needed to predict the actual value of individuals was estimated. Principal component analysis using the phenotypic variance and covariance matrix provided the most accurate estimates of both r and the number of measurements required for accurate evaluation of fruit yield in soursop. Our results indicate that selection of soursop genotypes with high fruit yield can be performed based on the third and fourth measurements in the early years and/or based on the eighth and ninth measurements at more advanced stages.
Lin, P.-S.; Chiou, B.; Abrahamson, N.; Walling, M.; Lee, C.-T.; Cheng, C.-T.
2011-01-01
In this study, we quantify the reduction in the standard deviation for empirical ground-motion prediction models by removing ergodic assumption.We partition the modeling error (residual) into five components, three of which represent the repeatable source-location-specific, site-specific, and path-specific deviations from the population mean. A variance estimation procedure of these error components is developed for use with a set of recordings from earthquakes not heavily clustered in space.With most source locations and propagation paths sampled only once, we opt to exploit the spatial correlation of residuals to estimate the variances associated with the path-specific and the source-location-specific deviations. The estimation procedure is applied to ground-motion amplitudes from 64 shallow earthquakes in Taiwan recorded at 285 sites with at least 10 recordings per site. The estimated variance components are used to quantify the reduction in aleatory variability that can be used in hazard analysis for a single site and for a single path. For peak ground acceleration and spectral accelerations at periods of 0.1, 0.3, 0.5, 1.0, and 3.0 s, we find that the singlesite standard deviations are 9%-14% smaller than the total standard deviation, whereas the single-path standard deviations are 39%-47% smaller.
NASA Astrophysics Data System (ADS)
Chang, Guobin; Xu, Tianhe; Yao, Yifei; Wang, Qianxin
2018-01-01
In order to incorporate the time smoothness of ionospheric delay to aid the cycle slip detection, an adaptive Kalman filter is developed based on variance component estimation. The correlations between measurements at neighboring epochs are fully considered in developing a filtering algorithm for colored measurement noise. Within this filtering framework, epoch-differenced ionospheric delays are predicted. Using this prediction, the potential cycle slips are repaired for triple-frequency signals of global navigation satellite systems. Cycle slips are repaired in a stepwise manner; i.e., for two extra wide lane combinations firstly and then for the third frequency. In the estimation for the third frequency, a stochastic model is followed in which the correlations between the ionospheric delay prediction errors and the errors in the epoch-differenced phase measurements are considered. The implementing details of the proposed method are tabulated. A real BeiDou Navigation Satellite System data set is used to check the performance of the proposed method. Most cycle slips, no matter trivial or nontrivial, can be estimated in float values with satisfactorily high accuracy and their integer values can hence be correctly obtained by simple rounding. To be more specific, all manually introduced nontrivial cycle slips are correctly repaired.
Estimating Model Prediction Error: Should You Treat Predictions as Fixed or Random?
NASA Technical Reports Server (NTRS)
Wallach, Daniel; Thorburn, Peter; Asseng, Senthold; Challinor, Andrew J.; Ewert, Frank; Jones, James W.; Rotter, Reimund; Ruane, Alexander
2016-01-01
Crop models are important tools for impact assessment of climate change, as well as for exploring management options under current climate. It is essential to evaluate the uncertainty associated with predictions of these models. We compare two criteria of prediction error; MSEP fixed, which evaluates mean squared error of prediction for a model with fixed structure, parameters and inputs, and MSEP uncertain( X), which evaluates mean squared error averaged over the distributions of model structure, inputs and parameters. Comparison of model outputs with data can be used to estimate the former. The latter has a squared bias term, which can be estimated using hindcasts, and a model variance term, which can be estimated from a simulation experiment. The separate contributions to MSEP uncertain (X) can be estimated using a random effects ANOVA. It is argued that MSEP uncertain (X) is the more informative uncertainty criterion, because it is specific to each prediction situation.
Performance of chromatographic systems to model soil-water sorption.
Hidalgo-Rodríguez, Marta; Fuguet, Elisabet; Ràfols, Clara; Rosés, Martí
2012-08-24
A systematic approach for evaluating the goodness of chromatographic systems to model the sorption of neutral organic compounds by soil from water is presented in this work. It is based on the examination of the three sources of error that determine the overall variance obtained when soil-water partition coefficients are correlated against chromatographic retention factors: the variance of the soil-water sorption data, the variance of the chromatographic data, and the variance attributed to the dissimilarity between the two systems. These contributions of variance are easily predicted through the characterization of the systems by the solvation parameter model. According to this method, several chromatographic systems besides the reference octanol-water partition system have been selected to test their performance in the emulation of soil-water sorption. The results from the experimental correlations agree with the predicted variances. The high-performance liquid chromatography system based on an immobilized artificial membrane and the micellar electrokinetic chromatography systems of sodium dodecylsulfate and sodium taurocholate provide the most precise correlation models. They have shown to predict well soil-water sorption coefficients of several tested herbicides. Octanol-water partitions and high-performance liquid chromatography measurements using C18 columns are less suited for the estimation of soil-water partition coefficients. Copyright © 2012 Elsevier B.V. All rights reserved.
Estimating Slash Quantity from Standing Loblolly Pine
Dale D. Wade
1969-01-01
No significant difference were found between variances of two prediction equations for estimating loblolly pine crown weight from diameter breast height (d.b.h). One equation was developed from trees on the Georgia Piedmont and the other from tress on the South Carolina Coastal Plain. An equation and table are presented for estimating loblolly pine slash weights from...
Knopman, Debra S.; Voss, Clifford I.
1987-01-01
The spatial and temporal variability of sensitivities has a significant impact on parameter estimation and sampling design for studies of solute transport in porous media. Physical insight into the behavior of sensitivities is offered through an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation. When parameters are estimated in regression models of one-dimensional transport, the spatial and temporal variability in sensitivities influences variance and covariance of parameter estimates. Several principles account for the observed influence of sensitivities on parameter uncertainty. (1) Information about a physical parameter may be most accurately gained at points in space and time with a high sensitivity to the parameter. (2) As the distance of observation points from the upstream boundary increases, maximum sensitivity to velocity during passage of the solute front increases and the consequent estimate of velocity tends to have lower variance. (3) The frequency of sampling must be “in phase” with the S shape of the dispersion sensitivity curve to yield the most information on dispersion. (4) The sensitivity to the dispersion coefficient is usually at least an order of magnitude less than the sensitivity to velocity. (5) The assumed probability distribution of random error in observations of solute concentration determines the form of the sensitivities. (6) If variance in random error in observations is large, trends in sensitivities of observation points may be obscured by noise and thus have limited value in predicting variance in parameter estimates among designs. (7) Designs that minimize the variance of one parameter may not necessarily minimize the variance of other parameters. (8) The time and space interval over which an observation point is sensitive to a given parameter depends on the actual values of the parameters in the underlying physical system.
Logistics for Working Together to Facilitate Genomic/Quantitative Genetic Prediction
USDA-ARS?s Scientific Manuscript database
The incorporation of DNA tests into the national cattle evaluation system will require estimation of variances of and covariances among the additive genetic components of the DNA tests and the phenotypic traits they are intended to predict. Populations with both DNA test results and phenotypes will ...
NASA Astrophysics Data System (ADS)
Graham, Wendy; Destouni, Georgia; Demmy, George; Foussereau, Xavier
1998-07-01
The methodology developed in Destouni and Graham [Destouni, G., Graham, W.D., 1997. The influence of observation method on local concentration statistics in the subsurface. Water Resour. Res. 33 (4) 663-676.] for predicting locally measured concentration statistics for solute transport in heterogeneous porous media under saturated flow conditions is applied to the prediction of conservative nonreactive solute transport in the vadose zone where observations are obtained by soil coring. Exact analytical solutions are developed for both the mean and variance of solute concentrations measured in discrete soil cores using a simplified physical model for vadose-zone flow and solute transport. Theoretical results show that while the ensemble mean concentration is relatively insensitive to the length-scale of the measurement, predictions of the concentration variance are significantly impacted by the sampling interval. Results also show that accounting for vertical heterogeneity in the soil profile results in significantly less spreading in the mean and variance of the measured solute breakthrough curves, indicating that it is important to account for vertical heterogeneity even for relatively small travel distances. Model predictions for both the mean and variance of locally measured solute concentration, based on independently estimated model parameters, agree well with data from a field tracer test conducted in Manatee County, Florida.
Schmutz, Joel A.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
Stochastic variation in survival rates is expected to decrease long-term population growth rates. This expectation influences both life-history theory and the conservation of species. From this expectation, Pfister (1998) developed the important life-history prediction that natural selection will have minimized variability in those elements of the annual life cycle (such as adult survival rate) with high sensitivity. This prediction has not been rigorously evaluated for bird populations, in part due to statistical difficulties related to variance estimation. I here overcome these difficulties, and in an analysis of 62 populations, I confirm her prediction by showing a negative relationship between the proportional sensitivity (elasticity) of adult survival and the proportional variance (CV) of adult survival. However, several species deviated significantly from this expectation, with more process variance in survival than predicted. For instance, projecting the magnitude of process variance in annual survival for American redstarts (Setophaga ruticilla) for 25 years resulted in a 44% decline in abundance without assuming any change in mean survival rate. For most of these species with high process variance, recent changes in harvest, habitats, or changes in climate patterns are the likely sources of environmental variability causing this variability in survival. Because of climate change, environmental variability is increasing on regional and global scales, which is expected to increase stochasticity in vital rates of species. Increased stochasticity in survival will depress population growth rates, and this result will magnify the conservation challenges we face.
NASA Astrophysics Data System (ADS)
Moster, Benjamin P.; Somerville, Rachel S.; Newman, Jeffrey A.; Rix, Hans-Walter
2011-04-01
Deep pencil beam surveys (<1 deg2) are of fundamental importance for studying the high-redshift universe. However, inferences about galaxy population properties (e.g., the abundance of objects) are in practice limited by "cosmic variance." This is the uncertainty in observational estimates of the number density of galaxies arising from the underlying large-scale density fluctuations. This source of uncertainty can be significant, especially for surveys which cover only small areas and for massive high-redshift galaxies. Cosmic variance for a given galaxy population can be determined using predictions from cold dark matter theory and the galaxy bias. In this paper, we provide tools for experiment design and interpretation. For a given survey geometry, we present the cosmic variance of dark matter as a function of mean redshift \\bar{z} and redshift bin size Δz. Using a halo occupation model to predict galaxy clustering, we derive the galaxy bias as a function of mean redshift for galaxy samples of a given stellar mass range. In the linear regime, the cosmic variance of these galaxy samples is the product of the galaxy bias and the dark matter cosmic variance. We present a simple recipe using a fitting function to compute cosmic variance as a function of the angular dimensions of the field, \\bar{z}, Δz, and stellar mass m *. We also provide tabulated values and a software tool. The accuracy of the resulting cosmic variance estimates (δσ v /σ v ) is shown to be better than 20%. We find that for GOODS at \\bar{z}=2 and with Δz = 0.5, the relative cosmic variance of galaxies with m *>1011 M sun is ~38%, while it is ~27% for GEMS and ~12% for COSMOS. For galaxies of m * ~ 1010 M sun, the relative cosmic variance is ~19% for GOODS, ~13% for GEMS, and ~6% for COSMOS. This implies that cosmic variance is a significant source of uncertainty at \\bar{z}=2 for small fields and massive galaxies, while for larger fields and intermediate mass galaxies, cosmic variance is less serious.
Greenbaum, Gili; Renan, Sharon; Templeton, Alan R; Bouskila, Amos; Saltz, David; Rubenstein, Daniel I; Bar-David, Shirli
2017-12-22
Effective population size, a central concept in conservation biology, is now routinely estimated from genetic surveys and can also be theoretically predicted from demographic, life-history, and mating-system data. By evaluating the consistency of theoretical predictions with empirically estimated effective size, insights can be gained regarding life-history characteristics and the relative impact of different life-history traits on genetic drift. These insights can be used to design and inform management strategies aimed at increasing effective population size. We demonstrated this approach by addressing the conservation of a reintroduced population of Asiatic wild ass (Equus hemionus). We estimated the variance effective size (N ev ) from genetic data (N ev =24.3) and formulated predictions for the impacts on N ev of demography, polygyny, female variance in lifetime reproductive success (RS), and heritability of female RS. By contrasting the genetic estimation with theoretical predictions, we found that polygyny was the strongest factor affecting genetic drift because only when accounting for polygyny were predictions consistent with the genetically measured N ev . The comparison of effective-size estimation and predictions indicated that 10.6% of the males mated per generation when heritability of female RS was unaccounted for (polygyny responsible for 81% decrease in N ev ) and 19.5% mated when female RS was accounted for (polygyny responsible for 67% decrease in N ev ). Heritability of female RS also affected N ev ; hf2=0.91 (heritability responsible for 41% decrease in N ev ). The low effective size is of concern, and we suggest that management actions focus on factors identified as strongly affecting Nev, namely, increasing the availability of artificial water sources to increase number of dominant males contributing to the gene pool. This approach, evaluating life-history hypotheses in light of their impact on effective population size, and contrasting predictions with genetic measurements, is a general, applicable strategy that can be used to inform conservation practice. © 2017 Society for Conservation Biology.
NASA Technical Reports Server (NTRS)
Tomaine, R. L.
1976-01-01
Flight test data from a large 'crane' type helicopter were collected and processed for the purpose of identifying vehicle rigid body stability and control derivatives. The process consisted of using digital and Kalman filtering techniques for state estimation and Extended Kalman filtering for parameter identification, utilizing a least squares algorithm for initial derivative and variance estimates. Data were processed for indicated airspeeds from 0 m/sec to 152 m/sec. Pulse, doublet and step control inputs were investigated. Digital filter frequency did not have a major effect on the identification process, while the initial derivative estimates and the estimated variances had an appreciable effect on many derivative estimates. The major derivatives identified agreed fairly well with analytical predictions and engineering experience. Doublet control inputs provided better results than pulse or step inputs.
Assessing non-additive effects in GBLUP model.
Vieira, I C; Dos Santos, J P R; Pires, L P M; Lima, B M; Gonçalves, F M A; Balestre, M
2017-05-10
Understanding non-additive effects in the expression of quantitative traits is very important in genotype selection, especially in species where the commercial products are clones or hybrids. The use of molecular markers has allowed the study of non-additive genetic effects on a genomic level, in addition to a better understanding of its importance in quantitative traits. Thus, the purpose of this study was to evaluate the behavior of the GBLUP model in different genetic models and relationship matrices and their influence on the estimates of genetic parameters. We used real data of the circumference at breast height in Eucalyptus spp and simulated data from a population of F 2 . Three commonly reported kinship structures in the literature were adopted. The simulation results showed that the inclusion of epistatic kinship improved prediction estimates of genomic breeding values. However, the non-additive effects were not accurately recovered. The Fisher information matrix for real dataset showed high collinearity in estimates of additive, dominant, and epistatic variance, causing no gain in the prediction of the unobserved data and convergence problems. Estimates presented differences of genetic parameters and correlations considering the different kinship structures. Our results show that the inclusion of non-additive effects can improve the predictive ability or even the prediction of additive effects. However, the high distortions observed in the variance estimates when the Hardy-Weinberg equilibrium assumption is violated due to the presence of selection or inbreeding can converge at zero gains in models that consider epistasis in genomic kinship.
Adjusting for Health Status in Non-Linear Models of Health Care Disparities
Cook, Benjamin L.; McGuire, Thomas G.; Meara, Ellen; Zaslavsky, Alan M.
2009-01-01
This article compared conceptual and empirical strengths of alternative methods for estimating racial disparities using non-linear models of health care access. Three methods were presented (propensity score, rank and replace, and a combined method) that adjust for health status while allowing SES variables to mediate the relationship between race and access to care. Applying these methods to a nationally representative sample of blacks and non-Hispanic whites surveyed in the 2003 and 2004 Medical Expenditure Panel Surveys (MEPS), we assessed the concordance of each of these methods with the Institute of Medicine (IOM) definition of racial disparities, and empirically compared the methods' predicted disparity estimates, the variance of the estimates, and the sensitivity of the estimates to limitations of available data. The rank and replace and combined methods (but not the propensity score method) are concordant with the IOM definition of racial disparities in that each creates a comparison group with the appropriate marginal distributions of health status and SES variables. Predicted disparities and prediction variances were similar for the rank and replace and combined methods, but the rank and replace method was sensitive to limitations on SES information. For all methods, limiting health status information significantly reduced estimates of disparities compared to a more comprehensive dataset. We conclude that the two IOM-concordant methods were similar enough that either could be considered in disparity predictions. In datasets with limited SES information, the combined method is the better choice. PMID:20352070
Distribution of kriging errors, the implications and how to communicate them
NASA Astrophysics Data System (ADS)
Li, Hong Yi; Milne, Alice; Webster, Richard
2016-04-01
Kriging in one form or another has become perhaps the most popular method for spatial prediction in environmental science. Each prediction is unbiased and of minimum variance, which itself is estimated. The kriging variances depend on the mathematical model chosen to describe the spatial variation; different models, however plausible, give rise to different minimized variances. Practitioners often compare models by so-called cross-validation before finally choosing the most appropriate for their kriging. One proceeds as follows. One removes a unit (a sampling point) from the whole set, kriges the value there and compares the kriged value with the value observed to obtain the deviation or error. One repeats the process for each and every point in turn and for all plausible models. One then computes the mean errors (MEs) and the mean of the squared errors (MSEs). Ideally a squared error should equal the corresponding kriging variance (σK2), and so one is advised to choose the model for which on average the squared errors most nearly equal the kriging variances, i.e. the ratio MSDR = MSE/σK2 ≈ 1. Maximum likelihood estimation of models almost guarantees that the MSDR equals 1, and so the kriging variances are unbiased predictors of the squared error across the region. The method is based on the assumption that the errors have a normal distribution. The squared deviation ratio (SDR) should therefore be distributed as χ2 with one degree of freedom with a median of 0.455. We have found that often the median of the SDR (MedSDR) is less, in some instances much less, than 0.455 even though the mean of the SDR is close to 1. It seems that in these cases the distributions of the errors are leptokurtic, i.e. they have an excess of predictions close to the true values, excesses near the extremes and a dearth of predictions in between. In these cases the kriging variances are poor measures of the uncertainty at individual sites. The uncertainty is typically under-estimated for the extreme observations and compensated for by over estimating for other observations. Statisticians must tell users when they present maps of predictions. We illustrate the situation with results from mapping salinity in land reclaimed from the Yangtze delta in the Gulf of Hangzhou, China. There the apparent electrical conductivity (ECa) of the topsoil was measured at 525 points in a field of 2.3 ha. The marginal distribution of the observations was strongly positively skewed, and so the observed ECas were transformed to their logarithms to give an approximately symmetric distribution. That distribution was strongly platykurtic with short tails and no evident outliers. The logarithms were analysed as a mixed model of quadratic drift plus correlated random residuals with a spherical variogram. The kriged predictions that deviated from their true values with an MSDR of 0.993, but with a medSDR=0.324. The coefficient of kurtosis of the deviations was 1.45, i.e. substantially larger than 0 for a normal distribution. The reasons for this behaviour are being sought. The most likely explanation is that there are spatial outliers, i.e. points at which the observed values that differ markedly from those at their their closest neighbours.
Distribution of kriging errors, the implications and how to communicate them
NASA Astrophysics Data System (ADS)
Li, HongYi; Milne, Alice; Webster, Richard
2015-04-01
Kriging in one form or another has become perhaps the most popular method for spatial prediction in environmental science. Each prediction is unbiased and of minimum variance, which itself is estimated. The kriging variances depend on the mathematical model chosen to describe the spatial variation; different models, however plausible, give rise to different minimized variances. Practitioners often compare models by so-called cross-validation before finally choosing the most appropriate for their kriging. One proceeds as follows. One removes a unit (a sampling point) from the whole set, kriges the value there and compares the kriged value with the value observed to obtain the deviation or error. One repeats the process for each and every point in turn and for all plausible models. One then computes the mean errors (MEs) and the mean of the squared errors (MSEs). Ideally a squared error should equal the corresponding kriging variance (σ_K^2), and so one is advised to choose the model for which on average the squared errors most nearly equal the kriging variances, i.e. the ratio MSDR=MSE/ σ_K2 ≈1. Maximum likelihood estimation of models almost guarantees that the MSDR equals 1, and so the kriging variances are unbiased predictors of the squared error across the region. The method is based on the assumption that the errors have a normal distribution. The squared deviation ratio (SDR) should therefore be distributed as χ2 with one degree of freedom with a median of 0.455. We have found that often the median of the SDR (MedSDR) is less, in some instances much less, than 0.455 even though the mean of the SDR is close to 1. It seems that in these cases the distributions of the errors are leptokurtic, i.e. they have an excess of predictions close to the true values, excesses near the extremes and a dearth of predictions in between. In these cases the kriging variances are poor measures of the uncertainty at individual sites. The uncertainty is typically under-estimated for the extreme observations and compensated for by over estimating for other observations. Statisticians must tell users when they present maps of predictions. We illustrate the situation with results from mapping salinity in land reclaimed from the Yangtze delta in the Gulf of Hangzhou, China. There the apparent electrical conductivity (EC_a) of the topsoil was measured at 525 points in a field of 2.3~ha. The marginal distribution of the observations was strongly positively skewed, and so the observed EC_as were transformed to their logarithms to give an approximately symmetric distribution. That distribution was strongly platykurtic with short tails and no evident outliers. The logarithms were analysed as a mixed model of quadratic drift plus correlated random residuals with a spherical variogram. The kriged predictions that deviated from their true values with an MSDR of 0.993, but with a medSDR=0.324. The coefficient of kurtosis of the deviations was 1.45, i.e. substantially larger than 0 for a normal distribution. The reasons for this behaviour are being sought. The most likely explanation is that there are spatial outliers, i.e. points at which the observed values that differ markedly from those at their their closest neighbours.
Ishwaran, Hemant; Lu, Min
2018-06-04
Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.
Jeran, S; Steinbrecher, A; Pischon, T
2016-08-01
Activity-related energy expenditure (AEE) might be an important factor in the etiology of chronic diseases. However, measurement of free-living AEE is usually not feasible in large-scale epidemiological studies but instead has traditionally been estimated based on self-reported physical activity. Recently, accelerometry has been proposed for objective assessment of physical activity, but it is unclear to what extent this methods explains the variance in AEE. We conducted a systematic review searching MEDLINE database (until 2014) on studies that estimated AEE based on accelerometry-assessed physical activity in adults under free-living conditions (using doubly labeled water method). Extracted study characteristics were sample size, accelerometer (type (uniaxial, triaxial), metrics (for example, activity counts, steps, acceleration), recording period, body position, wear time), explained variance of AEE (R(2)) and number of additional predictors. The relation of univariate and multivariate R(2) with study characteristics was analyzed using nonparametric tests. Nineteen articles were identified. Examination of various accelerometers or subpopulations in one article was treated separately, resulting in 28 studies. Sample sizes ranged from 10 to 149. In most studies the accelerometer was triaxial, worn at the trunk, during waking hours and reported activity counts as output metric. Recording periods ranged from 5 to 15 days. The variance of AEE explained by accelerometer-assessed physical activity ranged from 4 to 80% (median crude R(2)=26%). Sample size was inversely related to the explained variance. Inclusion of 1 to 3 other predictors in addition to accelerometer output significantly increased the explained variance to a range of 12.5-86% (median total R(2)=41%). The increase did not depend on the number of added predictors. We conclude that there is large heterogeneity across studies in the explained variance of AEE when estimated based on accelerometry. Thus, data on predicted AEE based on accelerometry-assessed physical activity need to be interpreted cautiously.
The scope and control of attention: Sources of variance in working memory capacity.
Chow, Michael; Conway, Andrew R A
2015-04-01
Working memory capacity is a strong positive predictor of many cognitive abilities, across various domains. The pattern of positive correlations across domains has been interpreted as evidence for a unitary source of inter-individual differences in behavior. However, recent work suggests that there are multiple sources of variance contributing to working memory capacity. The current study (N = 71) investigates individual differences in the scope and control of attention, in addition to the number and resolution of items maintained in working memory. Latent variable analyses indicate that the scope and control of attention reflect independent sources of variance and each account for unique variance in general intelligence. Also, estimates of the number of items maintained in working memory are consistent across tasks and related to general intelligence whereas estimates of resolution are task-dependent and not predictive of intelligence. These results provide insight into the structure of working memory, as well as intelligence, and raise new questions about the distinction between number and resolution in visual short-term memory.
Husby, Arild; Gustafsson, Lars; Qvarnström, Anna
2012-01-01
The avian incubation period is associated with high energetic costs and mortality risks suggesting that there should be strong selection to reduce the duration to the minimum required for normal offspring development. Although there is much variation in the duration of the incubation period across species, there is also variation within species. It is necessary to estimate to what extent this variation is genetically determined if we want to predict the evolutionary potential of this trait. Here we use a long-term study of collared flycatchers to examine the genetic basis of variation in incubation duration. We demonstrate limited genetic variance as reflected in the low and nonsignificant additive genetic variance, with a corresponding heritability of 0.04 and coefficient of additive genetic variance of 2.16. Any selection acting on incubation duration will therefore be inefficient. To our knowledge, this is the first time heritability of incubation duration has been estimated in a natural bird population. © 2011 by The University of Chicago.
Measurement System Characterization in the Presence of Measurement Errors
NASA Technical Reports Server (NTRS)
Commo, Sean A.
2012-01-01
In the calibration of a measurement system, data are collected in order to estimate a mathematical model between one or more factors of interest and a response. Ordinary least squares is a method employed to estimate the regression coefficients in the model. The method assumes that the factors are known without error; yet, it is implicitly known that the factors contain some uncertainty. In the literature, this uncertainty is known as measurement error. The measurement error affects both the estimates of the model coefficients and the prediction, or residual, errors. There are some methods, such as orthogonal least squares, that are employed in situations where measurement errors exist, but these methods do not directly incorporate the magnitude of the measurement errors. This research proposes a new method, known as modified least squares, that combines the principles of least squares with knowledge about the measurement errors. This knowledge is expressed in terms of the variance ratio - the ratio of response error variance to measurement error variance.
Ercanli, İlker; Kahriman, Aydın
2015-03-01
We assessed the effect of stand structural diversity, including the Shannon, improved Shannon, Simpson, McIntosh, Margelef, and Berger-Parker indices, on stand aboveground biomass (AGB) and developed statistical prediction models for the stand AGB values, including stand structural diversity indices and some stand attributes. The AGB prediction model, including only stand attributes, accounted for 85 % of the total variance in AGB (R (2)) with an Akaike's information criterion (AIC) of 807.2407, Bayesian information criterion (BIC) of 809.5397, Schwarz Bayesian criterion (SBC) of 818.0426, and root mean square error (RMSE) of 38.529 Mg. After inclusion of the stand structural diversity into the model structure, considerable improvement was observed in statistical accuracy, including 97.5 % of the total variance in AGB, with an AIC of 614.1819, BIC of 617.1242, SBC of 633.0853, and RMSE of 15.8153 Mg. The predictive fitting results indicate that some indices describing the stand structural diversity can be employed as significant independent variables to predict the AGB production of the Scotch pine stand. Further, including the stand diversity indices in the AGB prediction model with the stand attributes provided important predictive contributions in estimating the total variance in AGB.
Predictability Experiments With the Navy Operational Global Atmospheric Prediction System
NASA Astrophysics Data System (ADS)
Reynolds, C. A.; Gelaro, R.; Rosmond, T. E.
2003-12-01
There are several areas of research in numerical weather prediction and atmospheric predictability, such as targeted observations and ensemble perturbation generation, where it is desirable to combine information about the uncertainty of the initial state with information about potential rapid perturbation growth. Singular vectors (SVs) provide a framework to accomplish this task in a mathematically rigorous and computationally feasible manner. In this study, SVs are calculated using the tangent and adjoint models of the Navy Operational Global Atmospheric Prediction System (NOGAPS). The analysis error variance information produced by the NRL Atmospheric Variational Data Assimilation System is used as the initial-time SV norm. These VAR SVs are compared to SVs for which total energy is both the initial and final time norms (TE SVs). The incorporation of analysis error variance information has a significant impact on the structure and location of the SVs. This in turn has a significant impact on targeted observing applications. The utility and implications of such experiments in assessing the analysis error variance estimates will be explored. Computing support has been provided by the Department of Defense High Performance Computing Center at the Naval Oceanographic Office Major Shared Resource Center at Stennis, Mississippi.
Evaluation of non-additive genetic variation in feed-related traits of broiler chickens.
Li, Y; Hawken, R; Sapp, R; George, A; Lehnert, S A; Henshall, J M; Reverter, A
2017-03-01
Genome-wide association mapping and genomic predictions of phenotype of individuals in livestock are predominately based on the detection and estimation of additive genetic effects. Non-additive genetic effects are largely ignored. Studies in animals, plants, and humans to assess the impact of non-additive genetic effects in genetic analyses have led to differing conclusions. In this paper, we examined the consequences of including non-additive genetic effects in genome-wide association mapping and genomic prediction of total genetic values in a commercial population of 5,658 broiler chickens genotyped for 45,176 single nucleotide polymorphism (SNP) markers. We employed mixed-model equations and restricted maximum likelihood to analyze 7 feed related traits (TRT1 - TRT7). Dominance variance accounted for a significant proportion of the total genetic variance in all 7 traits, ranging from 29.5% for TRT1 to 58.4% for TRT7. Using a 5-fold cross-validation schema, we found that in spite of the large dominance component, including the estimated dominance effects in the prediction of total genetic values did not improve the accuracy of the predictions for any of the phenotypes. We offer some possible explanations for this counter-intuitive result including the possible confounding of dominance deviations with common environmental effects such as hatch, different directional effects of SNP additive and dominance variations, and the gene-gene interactions' failure to contribute to the level of variance. © 2016 Poultry Science Association Inc.
Milliren, Carly E; Evans, Clare R; Richmond, Tracy K; Dunn, Erin C
2018-06-06
Recent advances in multilevel modeling allow for modeling non-hierarchical levels (e.g., youth in non-nested schools and neighborhoods) using cross-classified multilevel models (CCMM). Current practice is to cluster samples from one context (e.g., schools) and utilize the observations however they are distributed from the second context (e.g., neighborhoods). However, it is unknown whether an uneven distribution of sample size across these contexts leads to incorrect estimates of random effects in CCMMs. Using the school and neighborhood data structure in Add Health, we examined the effect of neighborhood sample size imbalance on the estimation of variance parameters in models predicting BMI. We differentially assigned students from a given school to neighborhoods within that school's catchment area using three scenarios of (im)balance. 1000 random datasets were simulated for each of five combinations of school- and neighborhood-level variance and imbalance scenarios, for a total of 15,000 simulated data sets. For each simulation, we calculated 95% CIs for the variance parameters to determine whether the true simulated variance fell within the interval. Across all simulations, the "true" school and neighborhood variance parameters were estimated 93-96% of the time. Only 5% of models failed to capture neighborhood variance; 6% failed to capture school variance. These results suggest that there is no systematic bias in the ability of CCMM to capture the true variance parameters regardless of the distribution of students across neighborhoods. Ongoing efforts to use CCMM are warranted and can proceed without concern for the sample imbalance across contexts. Copyright © 2018 Elsevier Ltd. All rights reserved.
Assumption-free estimation of the genetic contribution to refractive error across childhood.
Guggenheim, Jeremy A; St Pourcain, Beate; McMahon, George; Timpson, Nicholas J; Evans, David M; Williams, Cathy
2015-01-01
Studies in relatives have generally yielded high heritability estimates for refractive error: twins 75-90%, families 15-70%. However, because related individuals often share a common environment, these estimates are inflated (via misallocation of unique/common environment variance). We calculated a lower-bound heritability estimate for refractive error free from such bias. Between the ages 7 and 15 years, participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) underwent non-cycloplegic autorefraction at regular research clinics. At each age, an estimate of the variance in refractive error explained by single nucleotide polymorphism (SNP) genetic variants was calculated using genome-wide complex trait analysis (GCTA) using high-density genome-wide SNP genotype information (minimum N at each age=3,404). The variance in refractive error explained by the SNPs ("SNP heritability") was stable over childhood: Across age 7-15 years, SNP heritability averaged 0.28 (SE=0.08, p<0.001). The genetic correlation for refractive error between visits varied from 0.77 to 1.00 (all p<0.001) demonstrating that a common set of SNPs was responsible for the genetic contribution to refractive error across this period of childhood. Simulations suggested lack of cycloplegia during autorefraction led to a small underestimation of SNP heritability (adjusted SNP heritability=0.35; SE=0.09). To put these results in context, the variance in refractive error explained (or predicted) by the time participants spent outdoors was <0.005 and by the time spent reading was <0.01, based on a parental questionnaire completed when the child was aged 8-9 years old. Genetic variation captured by common SNPs explained approximately 35% of the variation in refractive error between unrelated subjects. This value sets an upper limit for predicting refractive error using existing SNP genotyping arrays, although higher-density genotyping in larger samples and inclusion of interaction effects is expected to raise this figure toward twin- and family-based heritability estimates. The same SNPs influenced refractive error across much of childhood. Notwithstanding the strong evidence of association between time outdoors and myopia, and time reading and myopia, less than 1% of the variance in myopia at age 15 was explained by crude measures of these two risk factors, indicating that their effects may be limited, at least when averaged over the whole population.
Efficient prediction designs for random fields.
Müller, Werner G; Pronzato, Luc; Rendas, Joao; Waldl, Helmut
2015-03-01
For estimation and predictions of random fields, it is increasingly acknowledged that the kriging variance may be a poor representative of true uncertainty. Experimental designs based on more elaborate criteria that are appropriate for empirical kriging (EK) are then often non-space-filling and very costly to determine. In this paper, we investigate the possibility of using a compound criterion inspired by an equivalence theorem type relation to build designs quasi-optimal for the EK variance when space-filling designs become unsuitable. Two algorithms are proposed, one relying on stochastic optimization to explicitly identify the Pareto front, whereas the second uses the surrogate criteria as local heuristic to choose the points at which the (costly) true EK variance is effectively computed. We illustrate the performance of the algorithms presented on both a simple simulated example and a real oceanographic dataset. © 2014 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd.
Harris, Alexandre M.; DeGiorgio, Michael
2016-01-01
Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781
Azevedo Peixoto, Leonardo de; Laviola, Bruno Galvêas; Alves, Alexandre Alonso; Rosado, Tatiana Barbosa; Bhering, Leonardo Lopes
2017-01-01
Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY) and the weight of 100 seeds (W100S) using restricted maximum likelihood (REML); to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.
Lee, Yoojin; Callaghan, Martina F; Nagy, Zoltan
2017-01-01
In magnetic resonance imaging, precise measurements of longitudinal relaxation time ( T 1 ) is crucial to acquire useful information that is applicable to numerous clinical and neuroscience applications. In this work, we investigated the precision of T 1 relaxation time as measured using the variable flip angle method with emphasis on the noise propagated from radiofrequency transmit field ([Formula: see text]) measurements. The analytical solution for T 1 precision was derived by standard error propagation methods incorporating the noise from the three input sources: two spoiled gradient echo (SPGR) images and a [Formula: see text] map. Repeated in vivo experiments were performed to estimate the total variance in T 1 maps and we compared these experimentally obtained values with the theoretical predictions to validate the established theoretical framework. Both the analytical and experimental results showed that variance in the [Formula: see text] map propagated comparable noise levels into the T 1 maps as either of the two SPGR images. Improving precision of the [Formula: see text] measurements significantly reduced the variance in the estimated T 1 map. The variance estimated from the repeatedly measured in vivo T 1 maps agreed well with the theoretically-calculated variance in T 1 estimates, thus validating the analytical framework for realistic in vivo experiments. We concluded that for T 1 mapping experiments, the error propagated from the [Formula: see text] map must be considered. Optimizing the SPGR signals while neglecting to improve the precision of the [Formula: see text] map may result in grossly overestimating the precision of the estimated T 1 values.
Sniegula, Szymon; Golab, Maria J; Drobniak, Szymon M; Johansson, Frank
2018-06-01
Seasonal time constraints are usually stronger at higher than lower latitudes and can exert strong selection on life-history traits and the correlations among these traits. To predict the response of life-history traits to environmental change along a latitudinal gradient, information must be obtained about genetic variance in traits and also genetic correlation between traits, that is the genetic variance-covariance matrix, G. Here, we estimated G for key life-history traits in an obligate univoltine damselfly that faces seasonal time constraints. We exposed populations to simulated native temperatures and photoperiods and common garden environmental conditions in a laboratory set-up. Despite differences in genetic variance in these traits between populations (lower variance at northern latitudes), there was no evidence for latitude-specific covariance of the life-history traits. At simulated native conditions, all populations showed strong genetic and phenotypic correlations between traits that shaped growth and development. The variance-covariance matrix changed considerably when populations were exposed to common garden conditions compared with the simulated natural conditions, showing the importance of environmentally induced changes in multivariate genetic structure. Our results highlight the importance of estimating variance-covariance matrixes in environments that mimic selection pressures and not only trait variances or mean trait values in common garden conditions for understanding the trait evolution across populations and environments. © 2018 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2018 European Society For Evolutionary Biology.
A note on variance estimation in random effects meta-regression.
Sidik, Kurex; Jonkman, Jeffrey N
2005-01-01
For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis
Daye, Z. John; Chen, Jinbo; Li, Hongzhe
2011-01-01
Summary We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. PMID:22547833
NASA Astrophysics Data System (ADS)
Basu, Nandita B.; Fure, Adrian D.; Jawitz, James W.
2008-07-01
Simulations of nonpartitioning and partitioning tracer tests were used to parameterize the equilibrium stream tube model (ESM) that predicts the dissolution dynamics of dense nonaqueous phase liquids (DNAPLs) as a function of the Lagrangian properties of DNAPL source zones. Lagrangian, or stream-tube-based, approaches characterize source zones with as few as two trajectory-integrated parameters, in contrast to the potentially thousands of parameters required to describe the point-by-point variability in permeability and DNAPL in traditional Eulerian modeling approaches. The spill and subsequent dissolution of DNAPLs were simulated in two-dimensional domains having different hydrologic characteristics (variance of the log conductivity field = 0.2, 1, and 3) using the multiphase flow and transport simulator UTCHEM. Nonpartitioning and partitioning tracers were used to characterize the Lagrangian properties (travel time and trajectory-integrated DNAPL content statistics) of DNAPL source zones, which were in turn shown to be sufficient for accurate prediction of source dissolution behavior using the ESM throughout the relatively broad range of hydraulic conductivity variances tested here. The results were found to be relatively insensitive to travel time variability, suggesting that dissolution could be accurately predicted even if the travel time variance was only coarsely estimated. Estimation of the ESM parameters was also demonstrated using an approximate technique based on Eulerian data in the absence of tracer data; however, determining the minimum amount of such data required remains for future work. Finally, the stream tube model was shown to be a more unique predictor of dissolution behavior than approaches based on the ganglia-to-pool model for source zone characterization.
NASA Astrophysics Data System (ADS)
Mel, Riccardo; Viero, Daniele Pietro; Carniello, Luca; Defina, Andrea; D'Alpaos, Luigi
2014-09-01
Providing reliable and accurate storm surge forecasts is important for a wide range of problems related to coastal environments. In order to adequately support decision-making processes, it also become increasingly important to be able to estimate the uncertainty associated with the storm surge forecast. The procedure commonly adopted to do this uses the results of a hydrodynamic model forced by a set of different meteorological forecasts; however, this approach requires a considerable, if not prohibitive, computational cost for real-time application. In the present paper we present two simplified methods for estimating the uncertainty affecting storm surge prediction with moderate computational effort. In the first approach we use a computationally fast, statistical tidal model instead of a hydrodynamic numerical model to estimate storm surge uncertainty. The second approach is based on the observation that the uncertainty in the sea level forecast mainly stems from the uncertainty affecting the meteorological fields; this has led to the idea to estimate forecast uncertainty via a linear combination of suitable meteorological variances, directly extracted from the meteorological fields. The proposed methods were applied to estimate the uncertainty in the storm surge forecast in the Venice Lagoon. The results clearly show that the uncertainty estimated through a linear combination of suitable meteorological variances nicely matches the one obtained using the deterministic approach and overcomes some intrinsic limitations in the use of a statistical tidal model.
New Methods for Estimating Seasonal Potential Climate Predictability
NASA Astrophysics Data System (ADS)
Feng, Xia
This study develops two new statistical approaches to assess the seasonal potential predictability of the observed climate variables. One is the univariate analysis of covariance (ANOCOVA) model, a combination of autoregressive (AR) model and analysis of variance (ANOVA). It has the advantage of taking into account the uncertainty of the estimated parameter due to sampling errors in statistical test, which is often neglected in AR based methods, and accounting for daily autocorrelation that is not considered in traditional ANOVA. In the ANOCOVA model, the seasonal signals arising from external forcing are determined to be identical or not to assess any interannual variability that may exist is potentially predictable. The bootstrap is an attractive alternative method that requires no hypothesis model and is available no matter how mathematically complicated the parameter estimator. This method builds up the empirical distribution of the interannual variance from the resamplings drawn with replacement from the given sample, in which the only predictability in seasonal means arises from the weather noise. These two methods are applied to temperature and water cycle components including precipitation and evaporation, to measure the extent to which the interannual variance of seasonal means exceeds the unpredictable weather noise compared with the previous methods, including Leith-Shukla-Gutzler (LSG), Madden, and Katz. The potential predictability of temperature from ANOCOVA model, bootstrap, LSG and Madden exhibits a pronounced tropical-extratropical contrast with much larger predictability in the tropics dominated by El Nino/Southern Oscillation (ENSO) than in higher latitudes where strong internal variability lowers predictability. Bootstrap tends to display highest predictability of the four methods, ANOCOVA lies in the middle, while LSG and Madden appear to generate lower predictability. Seasonal precipitation from ANOCOVA, bootstrap, and Katz, resembling that for temperature, is more predictable over the tropical regions, and less predictable in extropics. Bootstrap and ANOCOVA are in good agreement with each other, both methods generating larger predictability than Katz. The seasonal predictability of evaporation over land bears considerably similarity with that of temperature using ANOCOVA, bootstrap, LSG and Madden. The remote SST forcing and soil moisture reveal substantial seasonality in their relations with the potentially predictable seasonal signals. For selected regions, either SST or soil moisture or both shows significant relationships with predictable signals, hence providing indirect insight on slowly varying boundary processes involved to enable useful seasonal climate predication. A multivariate analysis of covariance (MANOCOVA) model is established to identify distinctive predictable patterns, which are uncorrelated with each other. Generally speaking, the seasonal predictability from multivariate model is consistent with that from ANOCOVA. Besides unveiling the spatial variability of predictability, MANOCOVA model also reveals the temporal variability of each predictable pattern, which could be linked to the periodic oscillations.
Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen
2017-12-27
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
Fischer, A; Friggens, N C; Berry, D P; Faverdin, P
2018-07-01
The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.
Calus, Mario PL; Bijma, Piter; Veerkamp, Roel F
2004-01-01
Covariance functions have been proposed to predict breeding values and genetic (co)variances as a function of phenotypic within herd-year averages (environmental parameters) to include genotype by environment interaction. The objective of this paper was to investigate the influence of definition of environmental parameters and non-random use of sires on expected breeding values and estimated genetic variances across environments. Breeding values were simulated as a linear function of simulated herd effects. The definition of environmental parameters hardly influenced the results. In situations with random use of sires, estimated genetic correlations between the trait expressed in different environments were 0.93, 0.93 and 0.97 while simulated at 0.89 and estimated genetic variances deviated up to 30% from the simulated values. Non random use of sires, poor genetic connectedness and small herd size had a large impact on the estimated covariance functions, expected breeding values and calculated environmental parameters. Estimated genetic correlations between a trait expressed in different environments were biased upwards and breeding values were more biased when genetic connectedness became poorer and herd composition more diverse. The best possible solution at this stage is to use environmental parameters combining large numbers of animals per herd, while losing some information on genotype by environment interaction in the data. PMID:15339629
Network Structure and Biased Variance Estimation in Respondent Driven Sampling
Verdery, Ashton M.; Mouw, Ted; Bauldry, Shawn; Mucha, Peter J.
2015-01-01
This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network. PMID:26679927
Evolution, mutations, and human longevity: European royal and noble families.
Gavrilova, N S; Gavrilov, L A; Evdokushkina, G N; Semyonova, V G; Gavrilova, A L; Evdokushkina, N N; Kushnareva, Y E; Kroutko, V N; Andreyev AYu
1998-08-01
The evolutionary theory of aging predicts that the equilibrium gene frequency for deleterious mutations should increase with age at onset of mutation action because of weaker (postponed) selection against later-acting mutations. According to this mutation accumulation hypothesis, one would expect the genetic variability for survival (additive genetic variance) to increase with age. The ratio of additive genetic variance to the observed phenotypic variance (the heritability of longevity) can be estimated most reliably as the doubled slope of the regression line for offspring life span on paternal age at death. Thus, if longevity is indeed determined by late-acting deleterious mutations, one would expect this slope to become steeper at higher paternal ages. To test this prediction of evolutionary theory of aging, we computerized and analyzed the most reliable and accurate genealogical data on longevity in European royal and noble families. Offspring longevity for each sex (8409 records for males and 3741 records for females) was considered as a dependent variable in the multiple regression model and as a function of three independent predictors: paternal age at death (for estimation of heritability of life span), paternal age at reproduction (control for parental age effects), and cohort life expectancy (control for cohort and secular trends and fluctuations). We found that the regression slope for offspring longevity as a function of paternal longevity increases with paternal longevity, as predicted by the evolutionary theory of aging and by the mutation accumulation hypothesis in particular.
Statistical aspects of quantitative real-time PCR experiment design.
Kitchen, Robert R; Kubista, Mikael; Tichopad, Ales
2010-04-01
Experiments using quantitative real-time PCR to test hypotheses are limited by technical and biological variability; we seek to minimise sources of confounding variability through optimum use of biological and technical replicates. The quality of an experiment design is commonly assessed by calculating its prospective power. Such calculations rely on knowledge of the expected variances of the measurements of each group of samples and the magnitude of the treatment effect; the estimation of which is often uninformed and unreliable. Here we introduce a method that exploits a small pilot study to estimate the biological and technical variances in order to improve the design of a subsequent large experiment. We measure the variance contributions at several 'levels' of the experiment design and provide a means of using this information to predict both the total variance and the prospective power of the assay. A validation of the method is provided through a variance analysis of representative genes in several bovine tissue-types. We also discuss the effect of normalisation to a reference gene in terms of the measured variance components of the gene of interest. Finally, we describe a software implementation of these methods, powerNest, that gives the user the opportunity to input data from a pilot study and interactively modify the design of the assay. The software automatically calculates expected variances, statistical power, and optimal design of the larger experiment. powerNest enables the researcher to minimise the total confounding variance and maximise prospective power for a specified maximum cost for the large study. Copyright 2010 Elsevier Inc. All rights reserved.
Aerobic fitness, maturation, and training experience in youth basketball.
Carvalho, Humberto M; Coelho-e-Silva, Manuel J; Eisenmann, Joey C; Malina, Robert M
2013-07-01
Relationships among chronological age (CA), maturation, training experience, and body dimensions with peak oxygen uptake (VO2max) were considered in male basketball players 14-16 y of age. Data for all players included maturity status estimated as percentage of predicted adult height attained at the time of the study (Khamis-Roche protocol), years of training, body dimensions, and VO2max (incremental maximal test on a treadmill). Proportional allometric models derived from stepwise regressions were used to incorporate either CA or maturity status and to incorporate years of formal training in basketball. Estimates for size exponents (95% CI) from the separate allometric models for VO2max were height 2.16 (1.23-3.09), body mass 0.65 (0.37-0.93), and fat-free mass 0.73 (0.46-1.02). Body dimensions explained 39% to 44% of variance. The independent variables in the proportional allometric models explained 47% to 60% of variance in VO2max. Estimated maturity status (11-16% of explained variance) and training experience (7-11% of explained variance) were significant predictors with either body mass or estimated fat-free mass (P ≤ .01) but not with height. Biological maturity status and training experience in basketball had a significant contribution to VO2max via body mass and fat-free fat mass and also had an independent positive relation with aerobic performance. The results highlight the importance of considering variation associated with biological maturation in aerobic performance of late-adolescent boys.
Turner, Rebecca M; Davey, Jonathan; Clarke, Mike J; Thompson, Simon G; Higgins, Julian PT
2012-01-01
Background Many meta-analyses contain only a small number of studies, which makes it difficult to estimate the extent of between-study heterogeneity. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, and offers advantages over conventional random-effects meta-analysis. To assist in this, we provide empirical evidence on the likely extent of heterogeneity in particular areas of health care. Methods Our analyses included 14 886 meta-analyses from the Cochrane Database of Systematic Reviews. We classified each meta-analysis according to the type of outcome, type of intervention comparison and medical specialty. By modelling the study data from all meta-analyses simultaneously, using the log odds ratio scale, we investigated the impact of meta-analysis characteristics on the underlying between-study heterogeneity variance. Predictive distributions were obtained for the heterogeneity expected in future meta-analyses. Results Between-study heterogeneity variances for meta-analyses in which the outcome was all-cause mortality were found to be on average 17% (95% CI 10–26) of variances for other outcomes. In meta-analyses comparing two active pharmacological interventions, heterogeneity was on average 75% (95% CI 58–95) of variances for non-pharmacological interventions. Meta-analysis size was found to have only a small effect on heterogeneity. Predictive distributions are presented for nine different settings, defined by type of outcome and type of intervention comparison. For example, for a planned meta-analysis comparing a pharmacological intervention against placebo or control with a subjectively measured outcome, the predictive distribution for heterogeneity is a log-normal (−2.13, 1.582) distribution, which has a median value of 0.12. In an example of meta-analysis of six studies, incorporating external evidence led to a smaller heterogeneity estimate and a narrower confidence interval for the combined intervention effect. Conclusions Meta-analysis characteristics were strongly associated with the degree of between-study heterogeneity, and predictive distributions for heterogeneity differed substantially across settings. The informative priors provided will be very beneficial in future meta-analyses including few studies. PMID:22461129
Predicting Explosion-Generated SN and LG Coda Using Syntheic Seismograms
2008-09-01
velocities in the upper crust are based on borehole data, geologic and gravity data, refraction studies and seismic experiments (McLaughlin et al. 1983...realizations of random media. We have estimated the heterogeneity parameters for the NTS using available seismic and geologic data. Lateral correlation...variance and coherence measures between seismic traces are estimated from clusters of nuclear explosions and well- log data. The horizontal von Karman
Streamflow record extension using power transformations and application to sediment transport
NASA Astrophysics Data System (ADS)
Moog, Douglas B.; Whiting, Peter J.; Thomas, Robert B.
1999-01-01
To obtain a representative set of flow rates for a stream, it is often desirable to fill in missing data or extend measurements to a longer time period by correlation to a nearby gage with a longer record. Linear least squares regression of the logarithms of the flows is a traditional and still common technique. However, its purpose is to generate optimal estimates of each day's discharge, rather than the population of discharges, for which it tends to underestimate variance. Maintenance-of-variance-extension (MOVE) equations [Hirsch, 1982] were developed to correct this bias. This study replaces the logarithmic transformation by the more general Box-Cox scaled power transformation, generating a more linear, constant-variance relationship for the MOVE extension. Combining the Box-Cox transformation with the MOVE extension is shown to improve accuracy in estimating order statistics of flow rate, particularly for the nonextreme discharges which generally govern cumulative transport over time. This advantage is illustrated by prediction of cumulative fractions of total bed load transport.
Estimating the encounter rate variance in distance sampling
Fewster, R.M.; Buckland, S.T.; Burnham, K.P.; Borchers, D.L.; Jupp, P.E.; Laake, J.L.; Thomas, L.
2009-01-01
The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. ?? 2008, The International Biometric Society.
Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.
2014-01-01
The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736
Li, Xiujin; Lund, Mogens Sandø; Janss, Luc; Wang, Chonglong; Ding, Xiangdong; Zhang, Qin; Su, Guosheng
2017-03-15
With the development of SNP chips, SNP information provides an efficient approach to further disentangle different patterns of genomic variances and covariances across the genome for traits of interest. Due to the interaction between genotype and environment as well as possible differences in genetic background, it is reasonable to treat the performances of a biological trait in different populations as different but genetic correlated traits. In the present study, we performed an investigation on the patterns of region-specific genomic variances, covariances and correlations between Chinese and Nordic Holstein populations for three milk production traits. Variances and covariances between Chinese and Nordic Holstein populations were estimated for genomic regions at three different levels of genome region (all SNP as one region, each chromosome as one region and every 100 SNP as one region) using a novel multi-trait random regression model which uses latent variables to model heterogeneous variance and covariance. In the scenario of the whole genome as one region, the genomic variances, covariances and correlations obtained from the new multi-trait Bayesian method were comparable to those obtained from a multi-trait GBLUP for all the three milk production traits. In the scenario of each chromosome as one region, BTA 14 and BTA 5 accounted for very large genomic variance, covariance and correlation for milk yield and fat yield, whereas no specific chromosome showed very large genomic variance, covariance and correlation for protein yield. In the scenario of every 100 SNP as one region, most regions explained <0.50% of genomic variance and covariance for milk yield and fat yield, and explained <0.30% for protein yield, while some regions could present large variance and covariance. Although overall correlations between two populations for the three traits were positive and high, a few regions still showed weakly positive or highly negative genomic correlations for milk yield and fat yield. The new multi-trait Bayesian method using latent variables to model heterogeneous variance and covariance could work well for estimating the genomic variances and covariances for all genome regions simultaneously. Those estimated genomic parameters could be useful to improve the genomic prediction accuracy for Chinese and Nordic Holstein populations using a joint reference data in the future.
Assessing Multivariate Constraints to Evolution across Ten Long-Term Avian Studies
Teplitsky, Celine; Tarka, Maja; Møller, Anders P.; Nakagawa, Shinichi; Balbontín, Javier; Burke, Terry A.; Doutrelant, Claire; Gregoire, Arnaud; Hansson, Bengt; Hasselquist, Dennis; Gustafsson, Lars; de Lope, Florentino; Marzal, Alfonso; Mills, James A.; Wheelwright, Nathaniel T.; Yarrall, John W.; Charmantier, Anne
2014-01-01
Background In a rapidly changing world, it is of fundamental importance to understand processes constraining or facilitating adaptation through microevolution. As different traits of an organism covary, genetic correlations are expected to affect evolutionary trajectories. However, only limited empirical data are available. Methodology/Principal Findings We investigate the extent to which multivariate constraints affect the rate of adaptation, focusing on four morphological traits often shown to harbour large amounts of genetic variance and considered to be subject to limited evolutionary constraints. Our data set includes unique long-term data for seven bird species and a total of 10 populations. We estimate population-specific matrices of genetic correlations and multivariate selection coefficients to predict evolutionary responses to selection. Using Bayesian methods that facilitate the propagation of errors in estimates, we compare (1) the rate of adaptation based on predicted response to selection when including genetic correlations with predictions from models where these genetic correlations were set to zero and (2) the multivariate evolvability in the direction of current selection to the average evolvability in random directions of the phenotypic space. We show that genetic correlations on average decrease the predicted rate of adaptation by 28%. Multivariate evolvability in the direction of current selection was systematically lower than average evolvability in random directions of space. These significant reductions in the rate of adaptation and reduced evolvability were due to a general nonalignment of selection and genetic variance, notably orthogonality of directional selection with the size axis along which most (60%) of the genetic variance is found. Conclusions These results suggest that genetic correlations can impose significant constraints on the evolution of avian morphology in wild populations. This could have important impacts on evolutionary dynamics and hence population persistence in the face of rapid environmental change. PMID:24608111
A new statistic to express the uncertainty of kriging predictions for purposes of survey planning.
NASA Astrophysics Data System (ADS)
Lark, R. M.; Lapworth, D. J.
2014-05-01
It is well-known that one advantage of kriging for spatial prediction is that, given the random effects model, the prediction error variance can be computed a priori for alternative sampling designs. This allows one to compare sampling schemes, in particular sampling at different densities, and so to decide on one which meets requirements in terms of the uncertainty of the resulting predictions. However, the planning of sampling schemes must account not only for statistical considerations, but also logistics and cost. This requires effective communication between statisticians, soil scientists and data users/sponsors such as managers, regulators or civil servants. In our experience the latter parties are not necessarily able to interpret the prediction error variance as a measure of uncertainty for decision making. In some contexts (particularly the solution of very specific problems at large cartographic scales, e.g. site remediation and precision farming) it is possible to translate uncertainty of predictions into a loss function directly comparable with the cost incurred in increasing precision. Often, however, sampling must be planned for more generic purposes (e.g. baseline or exploratory geochemical surveys). In this latter context the prediction error variance may be of limited value to a non-statistician who has to make a decision on sample intensity and associated cost. We propose an alternative criterion for these circumstances to aid communication between statisticians and data users about the uncertainty of geostatistical surveys based on different sampling intensities. The criterion is the consistency of estimates made from two non-coincident instantiations of a proposed sample design. We consider square sample grids, one instantiation is offset from the second by half the grid spacing along the rows and along the columns. If a sample grid is coarse relative to the important scales of variation in the target property then the consistency of predictions from two instantiations is expected to be small, and can be increased by reducing the grid spacing. The measure of consistency is the correlation between estimates from the two instantiations of the sample grid, averaged over a grid cell. We call this the offset correlation, it can be calculated from the variogram. We propose that this measure is easier to grasp intuitively than the prediction error variance, and has the advantage of having an upper bound (1.0) which will aid its interpretation. This quality measure is illustrated for some hypothetical examples, considering both ordinary kriging and factorial kriging of the variable of interest. It is also illustrated using data on metal concentrations in the soil of north-east England.
Chan, Kelvin K W; Xie, Feng; Willan, Andrew R; Pullenayegum, Eleanor M
2017-04-01
Parameter uncertainty in value sets of multiattribute utility-based instruments (MAUIs) has received little attention previously. This false precision leads to underestimation of the uncertainty of the results of cost-effectiveness analyses. The aim of this study is to examine the use of multiple imputation as a method to account for this uncertainty of MAUI scoring algorithms. We fitted a Bayesian model with random effects for respondents and health states to the data from the original US EQ-5D-3L valuation study, thereby estimating the uncertainty in the EQ-5D-3L scoring algorithm. We applied these results to EQ-5D-3L data from the Commonwealth Fund (CWF) Survey for Sick Adults ( n = 3958), comparing the standard error of the estimated mean utility in the CWF population using the predictive distribution from the Bayesian mixed-effect model (i.e., incorporating parameter uncertainty in the value set) with the standard error of the estimated mean utilities based on multiple imputation and the standard error using the conventional approach of using MAUI (i.e., ignoring uncertainty in the value set). The mean utility in the CWF population based on the predictive distribution of the Bayesian model was 0.827 with a standard error (SE) of 0.011. When utilities were derived using the conventional approach, the estimated mean utility was 0.827 with an SE of 0.003, which is only 25% of the SE based on the full predictive distribution of the mixed-effect model. Using multiple imputation with 20 imputed sets, the mean utility was 0.828 with an SE of 0.011, which is similar to the SE based on the full predictive distribution. Ignoring uncertainty of the predicted health utilities derived from MAUIs could lead to substantial underestimation of the variance of mean utilities. Multiple imputation corrects for this underestimation so that the results of cost-effectiveness analyses using MAUIs can report the correct degree of uncertainty.
Efficiently estimating salmon escapement uncertainty using systematically sampled data
Reynolds, Joel H.; Woody, Carol Ann; Gove, Nancy E.; Fair, Lowell F.
2007-01-01
Fish escapement is generally monitored using nonreplicated systematic sampling designs (e.g., via visual counts from towers or hydroacoustic counts). These sampling designs support a variety of methods for estimating the variance of the total escapement. Unfortunately, all the methods give biased results, with the magnitude of the bias being determined by the underlying process patterns. Fish escapement commonly exhibits positive autocorrelation and nonlinear patterns, such as diurnal and seasonal patterns. For these patterns, poor choice of variance estimator can needlessly increase the uncertainty managers have to deal with in sustaining fish populations. We illustrate the effect of sampling design and variance estimator choice on variance estimates of total escapement for anadromous salmonids from systematic samples of fish passage. Using simulated tower counts of sockeye salmon Oncorhynchus nerka escapement on the Kvichak River, Alaska, five variance estimators for nonreplicated systematic samples were compared to determine the least biased. Using the least biased variance estimator, four confidence interval estimators were compared for expected coverage and mean interval width. Finally, five systematic sampling designs were compared to determine the design giving the smallest average variance estimate for total annual escapement. For nonreplicated systematic samples of fish escapement, all variance estimators were positively biased. Compared to the other estimators, the least biased estimator reduced bias by, on average, from 12% to 98%. All confidence intervals gave effectively identical results. Replicated systematic sampling designs consistently provided the smallest average estimated variance among those compared.
Input-variable sensitivity assessment for sediment transport relations
NASA Astrophysics Data System (ADS)
Fernández, Roberto; Garcia, Marcelo H.
2017-09-01
A methodology to assess input-variable sensitivity for sediment transport relations is presented. The Mean Value First Order Second Moment Method (MVFOSM) is applied to two bed load transport equations showing that it may be used to rank all input variables in terms of how their specific variance affects the overall variance of the sediment transport estimation. In sites where data are scarce or nonexistent, the results obtained may be used to (i) determine what variables would have the largest impact when estimating sediment loads in the absence of field observations and (ii) design field campaigns to specifically measure those variables for which a given transport equation is most sensitive; in sites where data are readily available, the results would allow quantifying the effect that the variance associated with each input variable has on the variance of the sediment transport estimates. An application of the method to two transport relations using data from a tropical mountain river in Costa Rica is implemented to exemplify the potential of the method in places where input data are limited. Results are compared against Monte Carlo simulations to assess the reliability of the method and validate its results. For both of the sediment transport relations used in the sensitivity analysis, accurate knowledge of sediment size was found to have more impact on sediment transport predictions than precise knowledge of other input variables such as channel slope and flow discharge.
NASA Astrophysics Data System (ADS)
El-Diasty, M.; El-Rabbany, A.; Pagiatakis, S.
2007-11-01
We examine the effect of varying the temperature points on MEMS inertial sensors' noise models using Allan variance and least-squares spectral analysis (LSSA). Allan variance is a method of representing root-mean-square random drift error as a function of averaging times. LSSA is an alternative to the classical Fourier methods and has been applied successfully by a number of researchers in the study of the noise characteristics of experimental series. Static data sets are collected at different temperature points using two MEMS-based IMUs, namely MotionPakII and Crossbow AHRS300CC. The performance of the two MEMS inertial sensors is predicted from the Allan variance estimation results at different temperature points and the LSSA is used to study the noise characteristics and define the sensors' stochastic model parameters. It is shown that the stochastic characteristics of MEMS-based inertial sensors can be identified using Allan variance estimation and LSSA and the sensors' stochastic model parameters are temperature dependent. Also, the Kaiser window FIR low-pass filter is used to investigate the effect of de-noising stage on the stochastic model. It is shown that the stochastic model is also dependent on the chosen cut-off frequency.
Validity of Futrex-5000 for body composition determination.
McLean, K P; Skinner, J S
1992-02-01
Underwater weighing (UWW), skinfolds (SKF), and the Futrex-5000 (FTX) were compared by using UWW as the criterion measure of body fat in 30 male and 31 female Caucasians. Estimates of body fat (% fat) were obtained using The Y's Way to Fitness SKF equations and the standard FTX technique with near-infrared interactance (NIR) measured at the biceps, plus six sites for men and five sites for women. SKF correlated significantly higher with UWW than did FTX with UWW for males (0.95 vs 0.80), females (0.88 vs 0.63), and the whole group (0.94 vs 0.81). Fewer subjects (52%) were within +/- 4% of the UWW value using FTX, compared with 87% with SKF. FTX overestimated body fat in lean subjects with less than 8% fat and underestimated it in subjects with greater than 30% fat. Measuring NIR at additional sites did not improve the predicted variance. Partial F-tests indicate that using body mass index, instead of height and weight, in the FTX equation improved body fat prediction for females. Biceps NIR predicted additional variance in body fat beyond height, weight, frame size, and activity level but little variance above that predicted by these four variables plus SKF (2% more in males and less than 1% in females). Thus, SKF give more information and more accurately predict body fat, especially at the extremes of the body fat continuum.
Lopes, Fernando B; da Silva, Marcelo C; Marques, Ednira G; McManus, Concepta M
2012-12-01
This study was undertaken to aim of estimating the genetic parameters and trends for asymptotic weight (A) and maturity rate (k) of Nellore cattle from northern Brazil. The data set was made available by the Brazilian Association of Zebu Breeders and collected between the years of 1997 and 2007. The Von Bertalanffy, Brody, Gompertz, and logistic nonlinear models were fitted by the Gauss-Newton method to weight-age data of 45,895 animals collected quarterly of the birth to 750 days old. The curve parameters were analyzed using the procedures GLM and CORR. The estimation of (co)variance components and genetic parameters was obtained using the MTDFREML software. The estimated heritability coefficients were 0.21 ± 0.013 and 0.25 ± 0.014 for asymptotic weight and maturity rate, respectively. This indicates that selection for any trait shall results in genetic progress in the herd. The genetic correlation between A and k was negative (-0.57 ± 0.03) and indicated that animals selected for high maturity rate shall result in low asymptotic weight. The Von Bertalanffy function is adequate to establish the mean growth patterns and to predict the adult weight of Nellore cattle. This model is more accurate in predicting the birth weight of these animals and has better overall fit. The prediction of adult weight using nonlinear functions can be accurate when growth curve parameters and their (co)variance components are estimated jointly. The model used in this study can be applied to the prediction of mature weight in herds where a portion of the animals are culled before they reach the adult age.
Gross, Alden L; Rebok, George W; Unverzagt, Frederick W; Willis, Sherry L; Brandt, Jason
2011-09-01
The present study sought to predict changes in everyday functioning using cognitive tests. Data from the Advanced Cognitive Training for Independent and Vital Elderly trial were used to examine the extent to which competence in different cognitive domains--memory, inductive reasoning, processing speed, and global mental status--predicts prospectively measured everyday functioning among older adults. Coefficients of determination for baseline levels and trajectories of everyday functioning were estimated using parallel process latent growth models. Each cognitive domain independently predicts a significant proportion of the variance in baseline and trajectory change of everyday functioning, with inductive reasoning explaining the most variance (R2 = .175) in baseline functioning and memory explaining the most variance (R2 = .057) in changes in everyday functioning. Inductive reasoning is an important determinant of current everyday functioning in community-dwelling older adults, suggesting that successful performance in daily tasks is critically dependent on executive cognitive function. On the other hand, baseline memory function is more important in determining change over time in everyday functioning, suggesting that some participants with low baseline memory function may reflect a subgroup with incipient progressive neurologic disease.
Robust versus consistent variance estimators in marginal structural Cox models.
Enders, Dirk; Engel, Susanne; Linder, Roland; Pigeot, Iris
2018-06-11
In survival analyses, inverse-probability-of-treatment (IPT) and inverse-probability-of-censoring (IPC) weighted estimators of parameters in marginal structural Cox models are often used to estimate treatment effects in the presence of time-dependent confounding and censoring. In most applications, a robust variance estimator of the IPT and IPC weighted estimator is calculated leading to conservative confidence intervals. This estimator assumes that the weights are known rather than estimated from the data. Although a consistent estimator of the asymptotic variance of the IPT and IPC weighted estimator is generally available, applications and thus information on the performance of the consistent estimator are lacking. Reasons might be a cumbersome implementation in statistical software, which is further complicated by missing details on the variance formula. In this paper, we therefore provide a detailed derivation of the variance of the asymptotic distribution of the IPT and IPC weighted estimator and explicitly state the necessary terms to calculate a consistent estimator of this variance. We compare the performance of the robust and consistent variance estimators in an application based on routine health care data and in a simulation study. The simulation reveals no substantial differences between the 2 estimators in medium and large data sets with no unmeasured confounding, but the consistent variance estimator performs poorly in small samples or under unmeasured confounding, if the number of confounders is large. We thus conclude that the robust estimator is more appropriate for all practical purposes. Copyright © 2018 John Wiley & Sons, Ltd.
A Variance Distribution Model of Surface EMG Signals Based on Inverse Gamma Distribution.
Hayashi, Hideaki; Furui, Akira; Kurita, Yuichi; Tsuji, Toshio
2017-11-01
Objective: This paper describes the formulation of a surface electromyogram (EMG) model capable of representing the variance distribution of EMG signals. Methods: In the model, EMG signals are handled based on a Gaussian white noise process with a mean of zero for each variance value. EMG signal variance is taken as a random variable that follows inverse gamma distribution, allowing the representation of noise superimposed onto this variance. Variance distribution estimation based on marginal likelihood maximization is also outlined in this paper. The procedure can be approximated using rectified and smoothed EMG signals, thereby allowing the determination of distribution parameters in real time at low computational cost. Results: A simulation experiment was performed to evaluate the accuracy of distribution estimation using artificially generated EMG signals, with results demonstrating that the proposed model's accuracy is higher than that of maximum-likelihood-based estimation. Analysis of variance distribution using real EMG data also suggested a relationship between variance distribution and signal-dependent noise. Conclusion: The study reported here was conducted to examine the performance of a proposed surface EMG model capable of representing variance distribution and a related distribution parameter estimation method. Experiments using artificial and real EMG data demonstrated the validity of the model. Significance: Variance distribution estimated using the proposed model exhibits potential in the estimation of muscle force. Objective: This paper describes the formulation of a surface electromyogram (EMG) model capable of representing the variance distribution of EMG signals. Methods: In the model, EMG signals are handled based on a Gaussian white noise process with a mean of zero for each variance value. EMG signal variance is taken as a random variable that follows inverse gamma distribution, allowing the representation of noise superimposed onto this variance. Variance distribution estimation based on marginal likelihood maximization is also outlined in this paper. The procedure can be approximated using rectified and smoothed EMG signals, thereby allowing the determination of distribution parameters in real time at low computational cost. Results: A simulation experiment was performed to evaluate the accuracy of distribution estimation using artificially generated EMG signals, with results demonstrating that the proposed model's accuracy is higher than that of maximum-likelihood-based estimation. Analysis of variance distribution using real EMG data also suggested a relationship between variance distribution and signal-dependent noise. Conclusion: The study reported here was conducted to examine the performance of a proposed surface EMG model capable of representing variance distribution and a related distribution parameter estimation method. Experiments using artificial and real EMG data demonstrated the validity of the model. Significance: Variance distribution estimated using the proposed model exhibits potential in the estimation of muscle force.
NASA Astrophysics Data System (ADS)
Salvucci, G.; Rigden, A. J.; Gentine, P.; Lintner, B. R.
2013-12-01
A new method was recently proposed for estimating evapotranspiration (ET) from weather station data without requiring measurements of surface limiting factors (e.g. soil moisture, leaf area, canopy conductance) [Salvucci and Gentine, 2013, PNAS, 110(16): 6287-6291]. Required measurements include diurnal air temperature, specific humidity, wind speed, net shortwave radiation, and either measured or estimated incoming longwave radiation and ground heat flux. The approach is built around the idea that the key, rate-limiting, parameter of typical ET models, the land-surface resistance to water vapor transport, can be estimated from an emergent relationship between the diurnal cycle of the relative humidity profile and ET. The emergent relation is that the vertical variance of the relative humidity profile is less than what would occur for increased or decreased evaporation rates, suggesting that land-atmosphere feedback processes minimize this variance. This relation was found to hold over a wide range of climate conditions (arid to humid) and limiting factors (soil moisture, leaf area, energy) at a set of Ameriflux field sites. While the field tests in Salvucci and Gentine (2013) supported the minimum variance hypothesis, the analysis did not reveal the mechanisms responsible for the behavior. Instead the paper suggested, heuristically, that the results were due to an equilibration of the relative humidity between the land surface and the surface layer of the boundary layer. Here we apply this method using surface meteorological fields simulated by a global climate model (GCM), and compare the predicted ET to that simulated by the climate model. Similar to the field tests, the GCM simulated ET is in agreement with that predicted by minimizing the profile relative humidity variance. A reasonable interpretation of these results is that the feedbacks responsible for the minimization of the profile relative humidity variance in nature are represented in the climate model. The climate model components, in particular the land surface model and boundary layer representation, can thus be analyzed in controlled numerical experiments to discern the specific processes leading to the observed behavior. Results of this analysis will be presented.
Thorlund, Kristian; Thabane, Lehana; Mills, Edward J
2013-01-11
Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the 'common variance' assumption). This approach 'borrows strength' for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary. In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities. In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach. MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice.
Analytic variance estimates of Swank and Fano factors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gutierrez, Benjamin; Badano, Aldo; Samuelson, Frank, E-mail: frank.samuelson@fda.hhs.gov
Purpose: Variance estimates for detector energy resolution metrics can be used as stopping criteria in Monte Carlo simulations for the purpose of ensuring a small uncertainty of those metrics and for the design of variance reduction techniques. Methods: The authors derive an estimate for the variance of two energy resolution metrics, the Swank factor and the Fano factor, in terms of statistical moments that can be accumulated without significant computational overhead. The authors examine the accuracy of these two estimators and demonstrate how the estimates of the coefficient of variation of the Swank and Fano factors behave with data frommore » a Monte Carlo simulation of an indirect x-ray imaging detector. Results: The authors' analyses suggest that the accuracy of their variance estimators is appropriate for estimating the actual variances of the Swank and Fano factors for a variety of distributions of detector outputs. Conclusions: The variance estimators derived in this work provide a computationally convenient way to estimate the error or coefficient of variation of the Swank and Fano factors during Monte Carlo simulations of radiation imaging systems.« less
Abbreviated neuropsychological assessment in schizophrenia
Harvey, Philip D.; Keefe, Richard S. E.; Patterson, Thomas L.; Heaton, Robert K.; Bowie, Christopher R.
2008-01-01
The aim of this study was to identify the best subset of neuropsychological tests for prediction of several different aspects of functioning in a large (n = 236) sample of older people with schizophrenia. While the validity of abbreviated assessment methods has been examined before, there has never been a comparative study of the prediction of different elements of cognitive impairment, real-world outcomes, and performance-based measures of functional capacity. Scores on 10 different tests from a neuropsychological assessment battery were used to predict global neuropsychological (NP) performance (indexed with averaged scores or calculated general deficit scores), performance-based indices of everyday-living skills and social competence, and case-manager ratings of real-world functioning. Forward entry stepwise regression analyses were used to identify the best predictors for each of the outcomes measures. Then, the analyses were adjusted for estimated premorbid IQ, which reduced the magnitude, but not the structure, of the correlations. Substantial amounts (over 70%) of the variance in overall NP performance were accounted for by a limited number of NP tests. Considerable variance in measures of functional capacity was also accounted for by a limited number of tests. Different tests constituted the best predictor set for each outcome measure. A substantial proportion of the variance in several different NP and functional outcomes can be accounted for by a small number of NP tests that can be completed in a few minutes, although there is considerable unexplained variance. However, the abbreviated assessments that best predict different outcomes vary across outcomes. Future studies should determine whether responses to pharmacological and remediation treatments can be captured with brief assessments as well. PMID:18720182
Siren, J; Ovaskainen, O; Merilä, J
2017-10-01
The genetic variance-covariance matrix (G) is a quantity of central importance in evolutionary biology due to its influence on the rate and direction of multivariate evolution. However, the predictive power of empirically estimated G-matrices is limited for two reasons. First, phenotypes are high-dimensional, whereas traditional statistical methods are tuned to estimate and analyse low-dimensional matrices. Second, the stability of G to environmental effects and over time remains poorly understood. Using Bayesian sparse factor analysis (BSFG) designed to estimate high-dimensional G-matrices, we analysed levels variation and covariation in 10,527 expressed genes in a large (n = 563) half-sib breeding design of three-spined sticklebacks subject to two temperature treatments. We found significant differences in the structure of G between the treatments: heritabilities and evolvabilities were higher in the warm than in the low-temperature treatment, suggesting more and faster opportunity to evolve in warm (stressful) conditions. Furthermore, comparison of G and its phenotypic equivalent P revealed the latter is a poor substitute of the former. Most strikingly, the results suggest that the expected impact of G on evolvability-as well as the similarity among G-matrices-may depend strongly on the number of traits included into analyses. In our results, the inclusion of only few traits in the analyses leads to underestimation in the differences between the G-matrices and their predicted impacts on evolution. While the results highlight the challenges involved in estimating G, they also illustrate that by enabling the estimation of large G-matrices, the BSFG method can improve predicted evolutionary responses to selection. © 2017 John Wiley & Sons Ltd.
Estimating integrated variance in the presence of microstructure noise using linear regression
NASA Astrophysics Data System (ADS)
Holý, Vladimír
2017-07-01
Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.
Genome wide selection in Citrus breeding.
Gois, I B; Borém, A; Cristofani-Yaly, M; de Resende, M D V; Azevedo, C F; Bastianel, M; Novelli, V M; Machado, M A
2016-10-17
Genome wide selection (GWS) is essential for the genetic improvement of perennial species such as Citrus because of its ability to increase gain per unit time and to enable the efficient selection of characteristics with low heritability. This study assessed GWS efficiency in a population of Citrus and compared it with selection based on phenotypic data. A total of 180 individual trees from a cross between Pera sweet orange (Citrus sinensis Osbeck) and Murcott tangor (Citrus sinensis Osbeck x Citrus reticulata Blanco) were evaluated for 10 characteristics related to fruit quality. The hybrids were genotyped using 5287 DArT_seq TM (diversity arrays technology) molecular markers and their effects on phenotypes were predicted using the random regression - best linear unbiased predictor (rr-BLUP) method. The predictive ability, prediction bias, and accuracy of GWS were estimated to verify its effectiveness for phenotype prediction. The proportion of genetic variance explained by the markers was also computed. The heritability of the traits, as determined by markers, was 16-28%. The predictive ability of these markers ranged from 0.53 to 0.64, and the regression coefficients between predicted and observed phenotypes were close to unity. Over 35% of the genetic variance was accounted for by the markers. Accuracy estimates with GWS were lower than those obtained by phenotypic analysis; however, GWS was superior in terms of genetic gain per unit time. Thus, GWS may be useful for Citrus breeding as it can predict phenotypes early and accurately, and reduce the length of the selection cycle. This study demonstrates the feasibility of genomic selection in Citrus.
Fleischhauer, Monika; Enge, Sören; Miller, Robert; Strobel, Alexander; Strobel, Anja
2013-01-01
Meta-analytic data highlight the value of the Implicit Association Test (IAT) as an indirect measure of personality. Based on evidence suggesting that confounding factors such as cognitive abilities contribute to the IAT effect, this study provides a first investigation of whether basic personality traits explain unwanted variance in the IAT. In a gender-balanced sample of 204 volunteers, the Big-Five dimensions were assessed via self-report, peer-report, and IAT. By means of structural equation modeling (SEM), latent Big-Five personality factors (based on self- and peer-report) were estimated and their predictive value for unwanted variance in the IAT was examined. In a first analysis, unwanted variance was defined in the sense of method-specific variance which may result from differences in task demands between the two IAT block conditions and which can be mirrored by the absolute size of the IAT effects. In a second analysis, unwanted variance was examined in a broader sense defined as those systematic variance components in the raw IAT scores that are not explained by the latent implicit personality factors. In contrast to the absolute IAT scores, this also considers biases associated with the direction of IAT effects (i.e., whether they are positive or negative in sign), biases that might result, for example, from the IAT's stimulus or category features. None of the explicit Big-Five factors was predictive for method-specific variance in the IATs (first analysis). However, when considering unwanted variance that goes beyond pure method-specific variance (second analysis), a substantial effect of neuroticism occurred that may have been driven by the affective valence of IAT attribute categories and the facilitated processing of negative stimuli, typically associated with neuroticism. The findings thus point to the necessity of using attribute category labels and stimuli of similar affective valence in personality IATs to avoid confounding due to recoding.
Guenole, Nigel
2016-01-01
We describe a Monte Carlo study examining the impact of assuming item isomorphism (i.e., equivalent construct meaning across levels of analysis) on conclusions about homology (i.e., equivalent structural relations across levels of analysis) under varying degrees of non-isomorphism in the context of ordinal indicator multilevel structural equation models (MSEMs). We focus on the condition where one or more loadings are higher on the between level than on the within level to show that while much past research on homology has ignored the issue of psychometric isomorphism, psychometric isomorphism is in fact critical to valid conclusions about homology. More specifically, when a measurement model with non-isomorphic items occupies an exogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the within level exogenous latent variance is under-estimated leading to over-estimation of the within level structural coefficient, while the between level exogenous latent variance is overestimated leading to underestimation of the between structural coefficient. When a measurement model with non-isomorphic items occupies an endogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the endogenous within level latent variance is under-estimated leading to under-estimation of the within level structural coefficient while the endogenous between level latent variance is over-estimated leading to over-estimation of the between level structural coefficient. The innovative aspect of this article is demonstrating that even minor violations of psychometric isomorphism render claims of homology untenable. We also show that posterior predictive p-values for ordinal indicator Bayesian MSEMs are insensitive to violations of isomorphism even when they lead to severely biased within and between level structural parameters. We highlight conditions where poor estimation of even correctly specified models rules out empirical examination of isomorphism and homology without taking precautions, for instance, larger Level-2 sample sizes, or using informative priors.
Guenole, Nigel
2016-01-01
We describe a Monte Carlo study examining the impact of assuming item isomorphism (i.e., equivalent construct meaning across levels of analysis) on conclusions about homology (i.e., equivalent structural relations across levels of analysis) under varying degrees of non-isomorphism in the context of ordinal indicator multilevel structural equation models (MSEMs). We focus on the condition where one or more loadings are higher on the between level than on the within level to show that while much past research on homology has ignored the issue of psychometric isomorphism, psychometric isomorphism is in fact critical to valid conclusions about homology. More specifically, when a measurement model with non-isomorphic items occupies an exogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the within level exogenous latent variance is under-estimated leading to over-estimation of the within level structural coefficient, while the between level exogenous latent variance is overestimated leading to underestimation of the between structural coefficient. When a measurement model with non-isomorphic items occupies an endogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the endogenous within level latent variance is under-estimated leading to under-estimation of the within level structural coefficient while the endogenous between level latent variance is over-estimated leading to over-estimation of the between level structural coefficient. The innovative aspect of this article is demonstrating that even minor violations of psychometric isomorphism render claims of homology untenable. We also show that posterior predictive p-values for ordinal indicator Bayesian MSEMs are insensitive to violations of isomorphism even when they lead to severely biased within and between level structural parameters. We highlight conditions where poor estimation of even correctly specified models rules out empirical examination of isomorphism and homology without taking precautions, for instance, larger Level-2 sample sizes, or using informative priors. PMID:26973580
Comparing Mapped Plot Estimators
Paul C. Van Deusen
2006-01-01
Two alternative derivations of estimators for mean and variance from mapped plots are compared by considering the models that support the estimators and by simulation. It turns out that both models lead to the same estimator for the mean but lead to very different variance estimators. The variance estimators based on the least valid model assumptions are shown to...
Coupled Modes over Indian Ocean at Sub-seasonal time Scales and its Prediction
NASA Astrophysics Data System (ADS)
Jung, E.; Kirtman, B. P.
2014-12-01
Sub-seasonal variability over the Indian Ocean, such as Madden-Julian Oscillation impacts weather and climate globally. However, the prediction of tropical sub-seasonal variability (TSV) remains a challenge, and understanding air-sea interactions on TSV time-scales is likely to be an important part of the prediction problem. The purpose of this paper is to examine the predictability of sub-seasonal variability in the tropical Indo-Pacific region. The analysis emphasizes on variability associated with coupled air-sea interactions in observational estimates, and how well these coupled modes are simulated and predicted within the context of a 30-year retrospective forecast experiment with a state-of-the-art atmosphere-ocean coupled model. The analysis shows that Sea Surface Temperature anomalies (SSTA) over the Indian Ocean tend to precede precipitation anomalies by 7-11 days with maximum amplitude over the Arabian Sea and the Bay of Bengal for summer and along the Seychelles-Chagos Thermocline Ridge (SCTR) region for winter. Though these coupled modes are captured by the models, the forecasts fail to predict its evolution. Based on the diagnosis of these coupled modes, we introduce a SCTR-SST index and an index that measures the modulation of the low-frequency amplitude (LFAM) of sub-seasonal SSTA variability over SCTR as a way to predict the coupled modes. Based on correlation with the observed variability, SCTR-SST has forecast skill of about 45 days over the Indian Ocean. However the sub-seasonal SSTAs in the predictions and the observational estimates do not have any direct ENSO tele-connection. In contrast, the LFAM of the sub-seasonal SSTA variance over SCTR is strongly correlated with ENSO, suggesting enhanced sub-seasonal variance on seasonal time-scales is potentially predictable.
College Influence on Student Intentions toward International Competence. ASHE Annual Meeting Paper.
ERIC Educational Resources Information Center
English, Susan Lewis
This study attempted to test the concept of international competence as a construct and to estimate the extent to which college experience predicts variance on student intentions toward international competence. Relying on Lambert's model of global competence, the study tested five components of international competence for validity and…
Post-Modeling Histogram Matching of Maps Produced Using Regression Trees
Andrew J. Lister; Tonya W. Lister
2006-01-01
Spatial predictive models often use statistical techniques that in some way rely on averaging of values. Estimates from linear modeling are known to be susceptible to truncation of variance when the independent (predictor) variables are measured with error. A straightforward post-processing technique (histogram matching) for attempting to mitigate this effect is...
Dyadic Short Forms of the Wechsler Adult Intelligence Scale-IV.
Denney, David A; Ringe, Wendy K; Lacritz, Laura H
2015-08-01
Full Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) administration can be time-consuming and may not be necessary when intelligence quotient estimates will suffice. Estimated Full Scale Intelligence Quotient (FSIQ) and General Ability Index (GAI) scores were derived from nine dyadic short forms using individual regression equations based on data from a clinical sample (n = 113) that was then cross validated in a separate clinical sample (n = 50). Derived scores accounted for 70%-83% of the variance in FSIQ and 77%-88% of the variance in GAI. Predicted FSIQs were strongly associated with actual FSIQ (rs = .73-.88), as were predicted and actual GAIs (rs = .80-.93). Each of the nine dyadic short forms of the WAIS-IV was a good predictor of FSIQ and GAI in the validation sample. These data support the validity of WAIS-IV short forms when time is limited or lengthier batteries cannot be tolerated by patients. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Machine Learning Estimates of Natural Product Conformational Energies
Rupp, Matthias; Bauer, Matthias R.; Wilcken, Rainer; Lange, Andreas; Reutlinger, Michael; Boeckler, Frank M.; Schneider, Gisbert
2014-01-01
Machine learning has been used for estimation of potential energy surfaces to speed up molecular dynamics simulations of small systems. We demonstrate that this approach is feasible for significantly larger, structurally complex molecules, taking the natural product Archazolid A, a potent inhibitor of vacuolar-type ATPase, from the myxobacterium Archangium gephyra as an example. Our model estimates energies of new conformations by exploiting information from previous calculations via Gaussian process regression. Predictive variance is used to assess whether a conformation is in the interpolation region, allowing a controlled trade-off between prediction accuracy and computational speed-up. For energies of relaxed conformations at the density functional level of theory (implicit solvent, DFT/BLYP-disp3/def2-TZVP), mean absolute errors of less than 1 kcal/mol were achieved. The study demonstrates that predictive machine learning models can be developed for structurally complex, pharmaceutically relevant compounds, potentially enabling considerable speed-ups in simulations of larger molecular structures. PMID:24453952
How Much Can Remotely-Sensed Natural Resource Inventories Benefit from Finer Spatial Resolutions?
NASA Astrophysics Data System (ADS)
Hou, Z.; Xu, Q.; McRoberts, R. E.; Ståhl, G.; Greenberg, J. A.
2017-12-01
For remote sensing facilitated natural resource inventories, the effects of spatial resolution in the form of pixel size and the effects of subpixel information on estimates of population parameters were evaluated by comparing results obtained using Landsat 8 and RapidEye auxiliary imagery. The study area was in Burkina Faso, and the variable of interest was the stem volume (m3/ha) convertible to the woodland aboveground biomass. A sample consisting of 160 field plots was selected and measured from the population following a two-stage sampling design. Models were fit using weighted least squares; the population mean, mu, and the variance of the estimator of the population mean, Var(mu.hat), were estimated in two inferential frameworks, model-based and model-assisted, and compared; for each framework, Var(mu.hat) was estimated both analytically and empirically. Empirical variances were estimated with bootstrapping that for resampling takes clustering effects into account. The primary results were twofold. First, for the effects of spatial resolution and subpixel information, four conclusions are relevant: (1) finer spatial resolution imagery indeed contributes to greater precision for estimators of population parameter, but this increase is slight at a maximum rate of 20% considering that RapidEye data are 36 times finer resolution than Landsat 8 data; (2) subpixel information on texture is marginally beneficial when it comes to making inference for population of large areas; (3) cost-effectiveness is more favorable for the free of charge Landsat 8 imagery than RapidEye imagery; and (4) for a given plot size, candidate remote sensing auxiliary datasets are more cost-effective when their spatial resolutions are similar to the plot size than with much finer alternatives. Second, for the comparison between estimators, three conclusions are relevant: (1) model-based variance estimates are consistent with each other and about half as large as stabilized model-assisted estimates, suggesting superior effectiveness of model-based inference to model-assisted inference; (2) bootstrapping is an effective alternative to analytical variance estimators; and (3) prediction accuracy expressed by RMSE is useful for screening candidate models to be used for population inferences.
Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D
2015-02-20
The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. Copyright © 2014 John Wiley & Sons, Ltd.
1984-05-01
By means of the concept of change-of variance function we investigate the stability properties of the asymptotic variance of R-estimators. This allows us to construct the optimal V-robust R-estimator that minimizes the asymptotic variance at the model, under the side condition of a bounded change-of variance function. Finally, we discuss the connection between this function and an influence function for two-sample rank tests introduced by Eplett (1980). (Author)
Uemoto, Yoshinobu; Sasaki, Shinji; Kojima, Takatoshi; Sugimoto, Yoshikazu; Watanabe, Toshio
2015-11-19
Genetic variance that is not captured by single nucleotide polymorphisms (SNPs) is due to imperfect linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTLs), and the extent of LD between SNPs and QTLs depends on different minor allele frequencies (MAF) between them. To evaluate the impact of MAF of QTLs on genomic evaluation, we performed a simulation study using real cattle genotype data. In total, 1368 Japanese Black cattle and 592,034 SNPs (Illumina BovineHD BeadChip) were used. We simulated phenotypes using real genotypes under different scenarios, varying the MAF categories, QTL heritability, number of QTLs, and distribution of QTL effect. After generating true breeding values and phenotypes, QTL heritability was estimated and the prediction accuracy of genomic estimated breeding value (GEBV) was assessed under different SNP densities, prediction models, and population size by a reference-test validation design. The extent of LD between SNPs and QTLs in this population was higher in the QTLs with high MAF than in those with low MAF. The effect of MAF of QTLs depended on the genetic architecture, evaluation strategy, and population size in genomic evaluation. In genetic architecture, genomic evaluation was affected by the MAF of QTLs combined with the QTL heritability and the distribution of QTL effect. The number of QTL was not affected on genomic evaluation if the number of QTL was more than 50. In the evaluation strategy, we showed that different SNP densities and prediction models affect the heritability estimation and genomic prediction and that this depends on the MAF of QTLs. In addition, accurate QTL heritability and GEBV were obtained using denser SNP information and the prediction model accounted for the SNPs with low and high MAFs. In population size, a large sample size is needed to increase the accuracy of GEBV. The MAF of QTL had an impact on heritability estimation and prediction accuracy. Most genetic variance can be captured using denser SNPs and the prediction model accounted for MAF, but a large sample size is needed to increase the accuracy of GEBV under all QTL MAF categories.
Ausman, Lynne M; Oliver, Lauren M; Goldin, Barry R; Woods, Margo N; Gorbach, Sherwood L; Dwyer, Johanna T
2008-09-01
Diet affects urine pH and acid-base balance. Both excess acid/alkaline ash (EAA) and estimated net acid excretion (NAE) calculations have been used to estimate the effects of diet on urine pH. This study's goal was to determine if free-living vegans, lacto-ovo vegetarians, and omnivores have increasingly acidic urine, and to assess the ability of EAA and estimated NAE calculations to predict urine pH. This study used a cross-sectional design. This study assessed urine samples of 10 vegan, 16 lacto-ovo vegetarian, and 16 healthy omnivorous women in the Boston metropolitan area. Six 3-day food records from each dietary group were analyzed for EAA content and estimated NAE, and correlations with measured urine pH were calculated. The mean (+/- SD) urine pH was 6.15 +/- 0.40 for vegans, 5.90 +/- 0.36 for lacto-ovo vegetarians, and 5.74 +/- 0.21 for omnivores (analysis of variance, P = .013). Calculated EAA values were not significantly different among the three groups, whereas mean estimated NAE values were significantly different: 17.3 +/- 14.5 mEq/day for vegans, 31.3 +/- 8.5 mEq/day for lacto-ovo vegetarians, and 42.6 +/- 13.2 mEq/day for omnivores (analysis of variance, P = .01). The average deattenuated correlation between urine pH and EAA was 0.333; this value was -0.768 for estimated NAE and urine pH, with a regression equation of pH = 6.33 - 0.014 NAE (P = .02, r = -0.54). Habitual diet and estimated NAE calculations indicate the probable ranking of urine pH by dietary groups, and may be used to determine the likely acid-base status of an individual; EAA calculations were not predictive of urine pH.
On predicting monitoring system effectiveness
NASA Astrophysics Data System (ADS)
Cappello, Carlo; Sigurdardottir, Dorotea; Glisic, Branko; Zonta, Daniele; Pozzi, Matteo
2015-03-01
While the objective of structural design is to achieve stability with an appropriate level of reliability, the design of systems for structural health monitoring is performed to identify a configuration that enables acquisition of data with an appropriate level of accuracy in order to understand the performance of a structure or its condition state. However, a rational standardized approach for monitoring system design is not fully available. Hence, when engineers design a monitoring system, their approach is often heuristic with performance evaluation based on experience, rather than on quantitative analysis. In this contribution, we propose a probabilistic model for the estimation of monitoring system effectiveness based on information available in prior condition, i.e. before acquiring empirical data. The presented model is developed considering the analogy between structural design and monitoring system design. We assume that the effectiveness can be evaluated based on the prediction of the posterior variance or covariance matrix of the state parameters, which we assume to be defined in a continuous space. Since the empirical measurements are not available in prior condition, the estimation of the posterior variance or covariance matrix is performed considering the measurements as a stochastic variable. Moreover, the model takes into account the effects of nuisance parameters, which are stochastic parameters that affect the observations but cannot be estimated using monitoring data. Finally, we present an application of the proposed model to a real structure. The results show how the model enables engineers to predict whether a sensor configuration satisfies the required performance.
Gao, Zan
2008-10-01
This study investigated the predictive strength of perceived competence and enjoyment on students' physical activity and cardiorespiratory fitness in physical education classes. Participants (N = 307; 101 in Grade 6, 96 in Grade 7, 110 in Grade 8; 149 boys, 158 girls) responded to questionnaires assessing perceived competence and enjoyment of physical education, then their cardiorespiratory fitness was assessed on the Progressive Aerobic Cardiovascular Endurance Run (PACER) test. Physical activity in one class was estimated via pedometers. Regression analyses showed enjoyment (R2 = 16.5) and perceived competence (R2 = 4.2) accounted for significant variance of only 20.7% of physical activity and, perceived competence was the only significant contributor to cardiorespiratory fitness performance (R2 = 19.3%). Only a small amount of variance here leaves 80% unaccounted for. Some educational implications and areas for research are mentioned.
Nakling, Jakob; Buhaug, Harald; Backe, Bjorn
2005-10-01
In a large unselected population of normal spontaneous pregnancies, to estimate the biologic variation of the interval from the first day of the last menstrual period to start of pregnancy, and the biologic variation of gestational length to delivery; and to estimate the random error of routine ultrasound assessment of gestational age in mid-second trimester. Cohort study of 11,238 singleton pregnancies, with spontaneous onset of labour and reliable last menstrual period. The day of delivery was predicted with two independent methods: According to the rule of Nägele and based on ultrasound examination in gestational weeks 17-19. For both methods, the mean difference between observed and predicted day of delivery was calculated. The variances of the differences were combined to estimate the variances of the two partitions of pregnancy. The biologic variation of the time from last menstrual period to pregnancy start was estimated to 7.0 days (standard deviation), and the standard deviation of the time to spontaneous delivery was estimated to 12.4 days. The estimate of the standard deviation of the random error of ultrasound assessed foetal age was 5.2 days. Even when the last menstrual period is reliable, the biologic variation of the time from last menstrual period to the real start of pregnancy is substantial, and must be taken into account. Reliable information about the first day of the last menstrual period is not equivalent with reliable information about the start of pregnancy.
Camarinha-Silva, Amelia; Maushammer, Maria; Wellmann, Robin; Vital, Marius; Preuss, Siegfried; Bennewitz, Jörn
2017-07-01
The aim of the present study was to analyze the interplay between gastrointestinal tract (GIT) microbiota, host genetics, and complex traits in pigs using extended quantitative-genetic methods. The study design consisted of 207 pigs that were housed and slaughtered under standardized conditions, and phenotyped for daily gain, feed intake, and feed conversion rate. The pigs were genotyped with a standard 60 K SNP chip. The GIT microbiota composition was analyzed by 16S rRNA gene amplicon sequencing technology. Eight from 49 investigated bacteria genera showed a significant narrow sense host heritability, ranging from 0.32 to 0.57. Microbial mixed linear models were applied to estimate the microbiota variance for each complex trait. The fraction of phenotypic variance explained by the microbial variance was 0.28, 0.21, and 0.16 for daily gain, feed conversion, and feed intake, respectively. The SNP data and the microbiota composition were used to predict the complex traits using genomic best linear unbiased prediction (G-BLUP) and microbial best linear unbiased prediction (M-BLUP) methods, respectively. The prediction accuracies of G-BLUP were 0.35, 0.23, and 0.20 for daily gain, feed conversion, and feed intake, respectively. The corresponding prediction accuracies of M-BLUP were 0.41, 0.33, and 0.33. Thus, in addition to SNP data, microbiota abundances are an informative source of complex trait predictions. Since the pig is a well-suited animal for modeling the human digestive tract, M-BLUP, in addition to G-BLUP, might be beneficial for predicting human predispositions to some diseases, and, consequently, for preventative and personalized medicine. Copyright © 2017 by the Genetics Society of America.
Validity of Bioelectrical Impedance Analysis to Estimation Fat-Free Mass in the Army Cadets.
Langer, Raquel D; Borges, Juliano H; Pascoa, Mauro A; Cirolini, Vagner X; Guerra-Júnior, Gil; Gonçalves, Ezequiel M
2016-03-11
Bioelectrical Impedance Analysis (BIA) is a fast, practical, non-invasive, and frequently used method for fat-free mass (FFM) estimation. The aims of this study were to validate predictive equations of BIA to FFM estimation in Army cadets and to develop and validate a specific BIA equation for this population. A total of 396 males, Brazilian Army cadets, aged 17-24 years were included. The study used eight published predictive BIA equations, a specific equation in FFM estimation, and dual-energy X-ray absorptiometry (DXA) as a reference method. Student's t-test (for paired sample), linear regression analysis, and Bland-Altman method were used to test the validity of the BIA equations. Predictive BIA equations showed significant differences in FFM compared to DXA (p < 0.05) and large limits of agreement by Bland-Altman. Predictive BIA equations explained 68% to 88% of FFM variance. Specific BIA equations showed no significant differences in FFM, compared to DXA values. Published BIA predictive equations showed poor accuracy in this sample. The specific BIA equations, developed in this study, demonstrated validity for this sample, although should be used with caution in samples with a large range of FFM.
2013-01-01
Background Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the ‘common variance’ assumption). This approach ‘borrows strength’ for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary. Methods In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities. Results In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach. Conclusions MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice. PMID:23311298
Hierarchical Bayesian Model Averaging for Chance Constrained Remediation Designs
NASA Astrophysics Data System (ADS)
Chitsazan, N.; Tsai, F. T.
2012-12-01
Groundwater remediation designs are heavily relying on simulation models which are subjected to various sources of uncertainty in their predictions. To develop a robust remediation design, it is crucial to understand the effect of uncertainty sources. In this research, we introduce a hierarchical Bayesian model averaging (HBMA) framework to segregate and prioritize sources of uncertainty in a multi-layer frame, where each layer targets a source of uncertainty. The HBMA framework provides an insight to uncertainty priorities and propagation. In addition, HBMA allows evaluating model weights in different hierarchy levels and assessing the relative importance of models in each level. To account for uncertainty, we employ a chance constrained (CC) programming for stochastic remediation design. Chance constrained programming was implemented traditionally to account for parameter uncertainty. Recently, many studies suggested that model structure uncertainty is not negligible compared to parameter uncertainty. Using chance constrained programming along with HBMA can provide a rigorous tool for groundwater remediation designs under uncertainty. In this research, the HBMA-CC was applied to a remediation design in a synthetic aquifer. The design was to develop a scavenger well approach to mitigate saltwater intrusion toward production wells. HBMA was employed to assess uncertainties from model structure, parameter estimation and kriging interpolation. An improved harmony search optimization method was used to find the optimal location of the scavenger well. We evaluated prediction variances of chloride concentration at the production wells through the HBMA framework. The results showed that choosing the single best model may lead to a significant error in evaluating prediction variances for two reasons. First, considering the single best model, variances that stem from uncertainty in the model structure will be ignored. Second, considering the best model with non-dominant model weight may underestimate or overestimate prediction variances by ignoring other plausible propositions. Chance constraints allow developing a remediation design with a desirable reliability. However, considering the single best model, the calculated reliability will be different from the desirable reliability. We calculated the reliability of the design for the models at different levels of HBMA. The results showed that by moving toward the top layers of HBMA, the calculated reliability converges to the chosen reliability. We employed the chance constrained optimization along with the HBMA framework to find the optimal location and pumpage for the scavenger well. The results showed that using models at different levels in the HBMA framework, the optimal location of the scavenger well remained the same, but the optimal extraction rate was altered. Thus, we concluded that the optimal pumping rate was sensitive to the prediction variance. Also, the prediction variance was changed by using different extraction rate. Using very high extraction rate will cause prediction variances of chloride concentration at the production wells to approach zero regardless of which HBMA models used.
Gorgey, Ashraf S; Dolbow, David R; Gater, David R
2012-07-01
To establish and validate prediction equations by using body weight to predict legs, trunk, and whole-body fat-free mass (FFM) in men with chronic complete spinal cord injury (SCI). Cross-sectional design. Research setting in a large medical center. Individuals with SCI (N=63) divided into prediction (n=42) and cross-validation (n=21) groups. Not applicable. Whole-body FFM and regional FFM were determined by using dual-energy x-ray absorptiometry. Body weight was measured by using a wheelchair weighing scale after subtracting the weight of the chair. Body weight predicted legs FFM (legs FFM=.09×body weight+6.1; R(2)=.25, standard error of the estimate [SEE]=3.1kg, P<.01), trunk FFM (trunk FFM=.21×body weight+8.6; R(2)=.56, SEE=3.6kg, P<.0001), and whole-body FFM (whole-body FFM=.288×body weight+26.3; R(2)=.53, SEE=5.3kg, P<.0001). The whole-body FFM(predicted) (FFM predicted from the derived equations) shared 86% of the variance in whole-body FFM(measured) (FFM measured using dual-energy x-ray absorptiometry scan) (R(2)=.86, SEE=1.8kg, P<.0001), 69% of trunk FFM(measured), and 66% of legs FFM(measured). The trunk FFM(predicted) shared 69% of the variance in trunk FFM(measured) (R(2)=.69, SEE=2.7kg, P<.0001), and legs FFM(predicted) shared 67% of the variance in legs FFM(measured) (R(2)=.67, SEE=2.8kg, P<.0001). Values of FFM did not differ between the prediction and validation groups. Body weight can be used to predict whole-body FFM and regional FFM. The predicted whole-body FFM improved the prediction of trunk FFM and legs FFM. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Eberhard, Wynn L
2017-04-01
The maximum likelihood estimator (MLE) is derived for retrieving the extinction coefficient and zero-range intercept in the lidar slope method in the presence of random and independent Gaussian noise. Least-squares fitting, weighted by the inverse of the noise variance, is equivalent to the MLE. Monte Carlo simulations demonstrate that two traditional least-squares fitting schemes, which use different weights, are less accurate. Alternative fitting schemes that have some positive attributes are introduced and evaluated. The principal factors governing accuracy of all these schemes are elucidated. Applying these schemes to data with Poisson rather than Gaussian noise alters accuracy little, even when the signal-to-noise ratio is low. Methods to estimate optimum weighting factors in actual data are presented. Even when the weighting estimates are coarse, retrieval accuracy declines only modestly. Mathematical tools are described for predicting retrieval accuracy. Least-squares fitting with inverse variance weighting has optimum accuracy for retrieval of parameters from single-wavelength lidar measurements when noise, errors, and uncertainties are Gaussian distributed, or close to optimum when only approximately Gaussian.
Bammann, K; Huybrechts, I; Vicente-Rodriguez, G; Easton, C; De Vriendt, T; Marild, S; Mesana, M I; Peeters, M W; Reilly, J J; Sioen, I; Tubic, B; Wawro, N; Wells, J C; Westerterp, K; Pitsiladis, Y; Moreno, L A
2013-04-01
To compare different field methods for estimating body fat mass with a reference value derived by a three-component (3C) model in pre-school and school children across Europe. Multicentre validation study. Seventy-eight preschool/school children aged 4-10 years from four different European countries. A standard measurement protocol was carried out in all children by trained field workers. A 3C model was used as the reference method. The field methods included height and weight measurement, circumferences measured at four sites, skinfold measured at two-six sites and foot-to-foot bioelectrical resistance (BIA) via TANITA scales. With the exception of height and neck circumference, all single measurements were able to explain at least 74% of the fat-mass variance in the sample. In combination, circumference models were superior to skinfold models and height-weight models. The best predictions were given by trunk models (combining skinfold and circumference measurements) that explained 91% of the observed fat-mass variance. The optimal data-driven model for our sample includes hip circumference, triceps skinfold and total body mass minus resistance index, and explains 94% of the fat-mass variance with 2.44 kg fat mass limits of agreement. In all investigated models, prediction errors were associated with fat mass, although to a lesser degree in the investigated skinfold models, arm models and the data-driven models. When studying total body fat in childhood populations, anthropometric measurements will give biased estimations as compared to gold standard measurements. Nevertheless, our study shows that when combining circumference and skinfold measurements, estimations of fat mass can be obtained with a limit of agreement of 1.91 kg in normal weight children and of 2.94 kg in overweight or obese children.
Examining the Causal Role of Leptin in Alzheimer Disease: A Mendelian Randomization Study.
Romo, Matthew L; Schooling, C Mary
2017-01-01
Observational evidence regarding the role of leptin in Alzheimer disease (AD) is conflicting. We sought to determine the causal role of circulating leptin and soluble plasma leptin receptor (sOB-R) levels in AD using a separate-sample Mendelian randomization study. Single nucleotide polymorphisms (SNPs) independently and solely predictive of log-transformed leptin (rs10487505 [LEP], rs780093 [GCKR], rs900400 [CCNL1], rs6071166 [SLC32A1], and rs6738627 [COBLL1]) and of sOB-R (rs1137101 [LEPR], rs2767485 [LEPR], and rs1751492 [LEPR]) levels (ng/mL) were obtained from 2 previously reported genome-wide association studies. We obtained associations of leptin and sOB-R levels with AD using inverse variance weighting with fixed effects by combining Wald estimates for each SNP. Sensitivity analyses included using weighted median and MR-Egger methods and repeating the analyses using only SNPs of genome-wide significance. Using inverse variance weighting, genetically predicted circulating leptin levels were not associated with AD, albeit with wide confidence intervals (CIs): odds ratio (OR) 0.99 per log-transformed ng/mL; 95% CI 0.55-1.78. Similarly, the association of sOB-R with AD was null using inverse variance weighting (OR 1.08 per log-transformed ng/mL; 95% CI 0.83-1.41). Results from our sensitivity analyses confirmed our findings. In this first Mendelian randomization study estimating the causal effect of leptin on AD, we did not find an effect of genetically predicted circulating leptin and sOB-R levels on AD. As such, this study suggests that leptin is unlikely to be a major contributor to AD, although the wide CIs preclude a definitive assessment. © 2017 S. Karger AG, Basel.
Petelle, M B; Martin, J G A; Blumstein, D T
2015-10-01
Describing and quantifying animal personality is now an integral part of behavioural studies because individually distinctive behaviours have ecological and evolutionary consequences. Yet, to fully understand how personality traits may respond to selection, one must understand the underlying heritability and genetic correlations between traits. Previous studies have reported a moderate degree of heritability of personality traits, but few of these studies have either been conducted in the wild or estimated the genetic correlations between personality traits. Estimating the additive genetic variance and covariance in the wild is crucial to understand the evolutionary potential of behavioural traits. Enhanced environmental variation could reduce heritability and genetic correlations, thus leading to different evolutionary predictions. We estimated the additive genetic variance and covariance of docility in the trap, sociability (mirror image stimulation), and exploration and activity in two different contexts (open-field and mirror image simulation experiments) in a wild population of yellow-bellied marmots (Marmota flaviventris). We estimated both heritability of behaviours and of personality traits and found nonzero additive genetic variance in these traits. We also found nonzero maternal, permanent environment and year effects. Finally, we found four phenotypic correlations between traits, and one positive genetic correlation between activity in the open-field test and sociability. We also found permanent environment correlations between activity in both tests and docility and exploration in the MIS test. This is one of a handful of studies to adopt a quantitative genetic approach to explain variation in personality traits in the wild and, thus, provides important insights into the potential variance available for selection. © 2015 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2015 European Society For Evolutionary Biology.
Comparing estimates of genetic variance across different relationship models.
Legarra, Andres
2016-02-01
Use of relationships between individuals to estimate genetic variances and heritabilities via mixed models is standard practice in human, plant and livestock genetics. Different models or information for relationships may give different estimates of genetic variances. However, comparing these estimates across different relationship models is not straightforward as the implied base populations differ between relationship models. In this work, I present a method to compare estimates of variance components across different relationship models. I suggest referring genetic variances obtained using different relationship models to the same reference population, usually a set of individuals in the population. Expected genetic variance of this population is the estimated variance component from the mixed model times a statistic, Dk, which is the average self-relationship minus the average (self- and across-) relationship. For most typical models of relationships, Dk is close to 1. However, this is not true for very deep pedigrees, for identity-by-state relationships, or for non-parametric kernels, which tend to overestimate the genetic variance and the heritability. Using mice data, I show that heritabilities from identity-by-state and kernel-based relationships are overestimated. Weighting these estimates by Dk scales them to a base comparable to genomic or pedigree relationships, avoiding wrong comparisons, for instance, "missing heritabilities". Copyright © 2015 Elsevier Inc. All rights reserved.
Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer.
Covarrubias-Pazaran, Giovanny
2016-01-01
Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI), Expectation-Maximization (EM) and Efficient Mixed Model Association (EMMA). Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.
Are the Stress Drops of Small Earthquakes Good Predictors of the Stress Drops of Larger Earthquakes?
NASA Astrophysics Data System (ADS)
Hardebeck, J.
2017-12-01
Uncertainty in PSHA could be reduced through better estimates of stress drop for possible future large earthquakes. Studies of small earthquakes find spatial variability in stress drop; if large earthquakes have similar spatial patterns, their stress drops may be better predicted using the stress drops of small local events. This regionalization implies the variance with respect to the local mean stress drop may be smaller than the variance with respect to the global mean. I test this idea using the Shearer et al. (2006) stress drop catalog for M1.5-3.1 events in southern California. I apply quality control (Hauksson, 2015) and remove near-field aftershocks (Wooddell & Abrahamson, 2014). The standard deviation of the distribution of the log10 stress drop is reduced from 0.45 (factor of 3) to 0.31 (factor of 2) by normalizing each event's stress drop by the local mean. I explore whether a similar variance reduction is possible when using the Shearer catalog to predict stress drops of larger southern California events. For catalogs of moderate-sized events (e.g. Kanamori, 1993; Mayeda & Walter, 1996; Boyd, 2017), normalizing by the Shearer catalog's local mean stress drop does not reduce the standard deviation compared to the unmodified stress drops. I compile stress drops of larger events from the literature, and identify 15 M5.5-7.5 earthquakes with at least three estimates. Because of the wide range of stress drop estimates for each event, and the different techniques and assumptions, it is difficult to assign a single stress drop value to each event. Instead, I compare the distributions of stress drop estimates for pairs of events, and test whether the means of the distributions are statistically significantly different. The events divide into 3 categories: low, medium, and high stress drop, with significant differences in mean stress drop between events in the low and the high stress drop categories. I test whether the spatial patterns of the Shearer catalog stress drops can predict the categories of the 15 events. I find that they cannot, rather the large event stress drops are uncorrelated with the local mean stress drop from the Shearer catalog. These results imply that the regionalization of stress drops of small events does not extend to the larger events, at least with current standard techniques of stress drop estimation.
NASA Astrophysics Data System (ADS)
de Montera, L.; Mallet, C.; Barthès, L.; Golé, P.
2008-08-01
This paper shows how nonlinear models originally developed in the finance field can be used to predict rain attenuation level and volatility in Earth-to-Satellite links operating at the Extremely High Frequencies band (EHF, 20 50 GHz). A common approach to solving this problem is to consider that the prediction error corresponds only to scintillations, whose variance is assumed to be constant. Nevertheless, this assumption does not seem to be realistic because of the heteroscedasticity of error time series: the variance of the prediction error is found to be time-varying and has to be modeled. Since rain attenuation time series behave similarly to certain stocks or foreign exchange rates, a switching ARIMA/GARCH model was implemented. The originality of this model is that not only the attenuation level, but also the error conditional distribution are predicted. It allows an accurate upper-bound of the future attenuation to be estimated in real time that minimizes the cost of Fade Mitigation Techniques (FMT) and therefore enables the communication system to reach a high percentage of availability. The performance of the switching ARIMA/GARCH model was estimated using a measurement database of the Olympus satellite 20/30 GHz beacons and this model is shown to outperform significantly other existing models. The model also includes frequency scaling from the downlink frequency to the uplink frequency. The attenuation effects (gases, clouds and rain) are first separated with a neural network and then scaled using specific scaling factors. As to the resulting uplink prediction error, the error contribution of the frequency scaling step is shown to be larger than that of the downlink prediction, indicating that further study should focus on improving the accuracy of the scaling factor.
ERIC Educational Resources Information Center
Oranje, Andreas
2006-01-01
A multitude of methods has been proposed to estimate the sampling variance of ratio estimates in complex samples (Wolter, 1985). Hansen and Tepping (1985) studied some of those variance estimators and found that a high coefficient of variation (CV) of the denominator of a ratio estimate is indicative of a biased estimate of the standard error of a…
Adaptive estimation of the log fluctuating conductivity from tracer data at the Cape Cod Site
Deng, F.W.; Cushman, J.H.; Delleur, J.W.
1993-01-01
An adaptive estimation scheme is used to obtain the integral scale and variance of the log-fluctuating conductivity at the Cape Cod site based on the fast Fourier transform/stochastic model of Deng et al. (1993) and a Kalmanlike filter. The filter incorporates prior estimates of the unknown parameters with tracer moment data to adaptively obtain improved estimates as the tracer evolves. The results show that significant improvement in the prior estimates of the conductivity can lead to substantial improvement in the ability to predict plume movement. The structure of the covariance function of the log-fluctuating conductivity can be identified from the robustness of the estimation. Both the longitudinal and transverse spatial moment data are important to the estimation.
USDA-ARS?s Scientific Manuscript database
We proposed a method to estimate the error variance among non-replicated genotypes, thus to estimate the genetic parameters by using replicated controls. We derived formulas to estimate sampling variances of the genetic parameters. Computer simulation indicated that the proposed methods of estimatin...
Tanner-Smith, Emily E; Tipton, Elizabeth
2014-03-01
Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and spss (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding the practical application and implementation of those macros. This paper provides a brief tutorial on the implementation of the Stata and spss macros and discusses practical issues meta-analysts should consider when estimating meta-regression models with robust variance estimates. Two example databases are used in the tutorial to illustrate the use of meta-analysis with robust variance estimates. Copyright © 2013 John Wiley & Sons, Ltd.
Energy and variance budgets of a diffusive staircase with implications for heat flux scaling
NASA Astrophysics Data System (ADS)
Hieronymus, M.; Carpenter, J. R.
2016-02-01
Diffusive convection, the mode of double-diffusive convection that occur when both temperature and salinity increase with increasing depth, is commonplace throughout the high latitude oceans and diffusive staircases constitute an important heat transport process in the Arctic Ocean. Heat and buoyancy fluxes through these staircases are often estimated using flux laws deduced either from laboratory experiments, or from simplified energy or variance budgets. We have done direct numerical simulations of double-diffusive convection at a range of Rayleigh numbers and quantified the energy and variance budgets in detail. This allows us to compare the fluxes in our simulations to those derived using known flux laws and to quantify how well the simplified energy and variance budgets approximate the full budgets. The fluxes are found to agree well with earlier estimates at high Rayleigh numbers, but we find large deviations at low Rayleigh numbers. The close ties between the heat and buoyancy fluxes and the budgets of thermal variance and energy have been utilized to derive heat flux scaling laws in the field of thermal convection. The result is the so called GL-theory, which has been found to give accurate heat flux scaling laws in a very wide parameter range. Diffusive convection has many similarities to thermal convection and an extension of the GL-theory to diffusive convection is also presented and its predictions are compared to the results from our numerical simulations.
Wildhaber, Mark L.; Albers, Janice; Green, Nicholas; Moran, Edward H.
2017-01-01
We develop a fully-stochasticized, age-structured population model suitable for population viability analysis (PVA) of fish and demonstrate its use with the endangered pallid sturgeon (Scaphirhynchus albus) of the Lower Missouri River as an example. The model incorporates three levels of variance: parameter variance (uncertainty about the value of a parameter itself) applied at the iteration level, temporal variance (uncertainty caused by random environmental fluctuations over time) applied at the time-step level, and implicit individual variance (uncertainty caused by differences between individuals) applied within the time-step level. We found that population dynamics were most sensitive to survival rates, particularly age-2+ survival, and to fecundity-at-length. The inclusion of variance (unpartitioned or partitioned), stocking, or both generally decreased the influence of individual parameters on population growth rate. The partitioning of variance into parameter and temporal components had a strong influence on the importance of individual parameters, uncertainty of model predictions, and quasiextinction risk (i.e., pallid sturgeon population size falling below 50 age-1+ individuals). Our findings show that appropriately applying variance in PVA is important when evaluating the relative importance of parameters, and reinforce the need for better and more precise estimates of crucial life-history parameters for pallid sturgeon.
Austin, Peter C
2016-12-30
Propensity score methods are used to reduce the effects of observed confounding when using observational data to estimate the effects of treatments or exposures. A popular method of using the propensity score is inverse probability of treatment weighting (IPTW). When using this method, a weight is calculated for each subject that is equal to the inverse of the probability of receiving the treatment that was actually received. These weights are then incorporated into the analyses to minimize the effects of observed confounding. Previous research has found that these methods result in unbiased estimation when estimating the effect of treatment on survival outcomes. However, conventional methods of variance estimation were shown to result in biased estimates of standard error. In this study, we conducted an extensive set of Monte Carlo simulations to examine different methods of variance estimation when using a weighted Cox proportional hazards model to estimate the effect of treatment. We considered three variance estimation methods: (i) a naïve model-based variance estimator; (ii) a robust sandwich-type variance estimator; and (iii) a bootstrap variance estimator. We considered estimation of both the average treatment effect and the average treatment effect in the treated. We found that the use of a bootstrap estimator resulted in approximately correct estimates of standard errors and confidence intervals with the correct coverage rates. The other estimators resulted in biased estimates of standard errors and confidence intervals with incorrect coverage rates. Our simulations were informed by a case study examining the effect of statin prescribing on mortality. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Noh, Wonjung; Seomun, Gyeongae
2015-06-01
This study was conducted to develop key performance indicators (KPIs) for home care nursing (HCN) based on a balanced scorecard, and to construct a performance prediction model of strategic objectives using the Bayesian Belief Network (BBN). This methodological study included four steps: establishment of KPIs, performance prediction modeling, development of a performance prediction model using BBN, and simulation of a suggested nursing management strategy. An HCN expert group and a staff group participated. The content validity index was analyzed using STATA 13.0, and BBN was analyzed using HUGIN 8.0. We generated a list of KPIs composed of 4 perspectives, 10 strategic objectives, and 31 KPIs. In the validity test of the performance prediction model, the factor with the greatest variance for increasing profit was maximum cost reduction of HCN services. The factor with the smallest variance for increasing profit was a minimum image improvement for HCN. During sensitivity analysis, the probability of the expert group did not affect the sensitivity. Furthermore, simulation of a 10% image improvement predicted the most effective way to increase profit. KPIs of HCN can estimate financial and non-financial performance. The performance prediction model for HCN will be useful to improve performance.
Xu, Chonggang; Gertner, George
2013-01-01
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. PMID:24143037
Xu, Chonggang; Gertner, George
2011-01-01
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements.
ON PREDICTING INFRAGRAVITY ENERGY IN THE SURF ZONE.
Sallenger,, Asbury H.; Holman, Robert A.; Edge, Billy L.
1985-01-01
Flow data were obtained in the surf zone across a barred profile during a storm. RMS cross-shore velocities due to waves in the intragravity band (wave periods greater than 20 s) had maxima in excess of 0. 5 m/s over the bar crest. For comparison to measured spectra, synthetic spectra of cross-shore flow were computed using measured nearshore profiles. The structure, in the infragravity band, of these synthetic spectra corresponded reasonably well with the structure of the measured spectra. Total variances of measured cross-shore flow within the infragravity band were nondimensionalized by dividing by total infragravity variances of synthetic spectra. These nondimensional variances were independent of distance offshore and increased with the square of the breaker height. Thus, cross-shore flow due to infragravity waves can be estimated with knowledge of the nearshore profile and incident wave conditions. Refs.
Physical activity, but not sedentary time, influences bone strength in late adolescence.
Tan, Vina Ps; Macdonald, Heather M; Gabel, Leigh; McKay, Heather A
2018-03-20
Physical activity is essential for optimal bone strength accrual, but we know little about interactions between physical activity, sedentary time, and bone outcomes in older adolescents. Physical activity (by accelerometer and self-report) positively predicted bone strength and the distal and midshaft tibia in 15-year-old boys and girls. Lean body mass mediated the relationship between physical activity and bone strength in adolescents. To examine the influence of physical activity (PA) and sedentary time on bone strength, structure, and density in older adolescents. We used peripheral quantitative computed tomography to estimate bone strength at the distal tibia (8% site; bone strength index, BSI) and tibial midshaft (50% site; polar strength strain index, SSI p ) in adolescent boys (n = 86; 15.3 ± 0.4 years) and girls (n = 106; 15.3 ± 0.4 years). Using accelerometers (GT1M, Actigraph), we measured moderate-to-vigorous PA (MVPA Accel ), vigorous PA (VPA Accel ), and sedentary time in addition to self-reported MVPA (MVPA PAQ-A ) and impact PA (ImpactPA PAQ-A ). We examined relations between PA and sedentary time and bone outcomes, adjusting for ethnicity, maturity, tibial length, and total body lean mass. At the distal tibia, MVPA Accel and VPA Accel positively predicted BSI (explained 6-7% of the variance, p < 0.05). After adjusting for lean mass, only VPA Accel explained residual variance in BSI. At the tibial midshaft, MVPA Accel , but not VPA Accel , positively predicted SSI p (explained 3% of the variance, p = 0.01). Lean mass attenuated this association. MVPA PAQ-A and ImpactPA PAQ-A also positively predicted BSI and SSI p (explained 2-4% of the variance, p < 0.05), but only ImpactPA PAQ-A explained residual variance in BSI after accounting for lean mass. Sedentary time did not independently predict bone strength at either site. Greater tibial bone strength in active adolescents is mediated, in part, by lean mass. Despite spending most of their day in sedentary pursuits, adolescents' bone strength was not negatively influenced by sedentary time.
Evaluating mallard adaptive management models with time series
Conn, P.B.; Kendall, W.L.
2004-01-01
Wildlife practitioners concerned with midcontinent mallard (Anas platyrhynchos) management in the United States have instituted a system of adaptive harvest management (AHM) as an objective format for setting harvest regulations. Under the AHM paradigm, predictions from a set of models that reflect key uncertainties about processes underlying population dynamics are used in coordination with optimization software to determine an optimal set of harvest decisions. Managers use comparisons of the predictive abilities of these models to gauge the relative truth of different hypotheses about density-dependent recruitment and survival, with better-predicting models giving more weight to the determination of harvest regulations. We tested the effectiveness of this strategy by examining convergence rates of 'predictor' models when the true model for population dynamics was known a priori. We generated time series for cases when the a priori model was 1 of the predictor models as well as for several cases when the a priori model was not in the model set. We further examined the addition of different levels of uncertainty into the variance structure of predictor models, reflecting different levels of confidence about estimated parameters. We showed that in certain situations, the model-selection process favors a predictor model that incorporates the hypotheses of additive harvest mortality and weakly density-dependent recruitment, even when the model is not used to generate data. Higher levels of predictor model variance led to decreased rates of convergence to the model that generated the data, but model weight trajectories were in general more stable. We suggest that predictive models should incorporate all sources of uncertainty about estimated parameters, that the variance structure should be similar for all predictor models, and that models with different functional forms for population dynamics should be considered for inclusion in predictor model! sets. All of these suggestions should help lower the probability of erroneous learning in mallard ABM and adaptive management in general.
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Methods to estimate the between‐study variance and its uncertainty in meta‐analysis†
Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian PT; Langan, Dean; Salanti, Georgia
2015-01-01
Meta‐analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between‐study variability, which is typically modelled using a between‐study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between‐study variance, has been long challenged. Our aim is to identify known methods for estimation of the between‐study variance and its corresponding uncertainty, and to summarise the simulation and empirical evidence that compares them. We identified 16 estimators for the between‐study variance, seven methods to calculate confidence intervals, and several comparative studies. Simulation studies suggest that for both dichotomous and continuous data the estimator proposed by Paule and Mandel and for continuous data the restricted maximum likelihood estimator are better alternatives to estimate the between‐study variance. Based on the scenarios and results presented in the published studies, we recommend the Q‐profile method and the alternative approach based on a ‘generalised Cochran between‐study variance statistic’ to compute corresponding confidence intervals around the resulting estimates. Our recommendations are based on a qualitative evaluation of the existing literature and expert consensus. Evidence‐based recommendations require an extensive simulation study where all methods would be compared under the same scenarios. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. PMID:26332144
Minimum number of measurements for evaluating Bertholletia excelsa.
Baldoni, A B; Tonini, H; Tardin, F D; Botelho, S C C; Teodoro, P E
2017-09-27
Repeatability studies on fruit species are of great importance to identify the minimum number of measurements necessary to accurately select superior genotypes. This study aimed to identify the most efficient method to estimate the repeatability coefficient (r) and predict the minimum number of measurements needed for a more accurate evaluation of Brazil nut tree (Bertholletia excelsa) genotypes based on fruit yield. For this, we assessed the number of fruits and dry mass of seeds of 75 Brazil nut genotypes, from native forest, located in the municipality of Itaúba, MT, for 5 years. To better estimate r, four procedures were used: analysis of variance (ANOVA), principal component analysis based on the correlation matrix (CPCOR), principal component analysis based on the phenotypic variance and covariance matrix (CPCOV), and structural analysis based on the correlation matrix (mean r - AECOR). There was a significant effect of genotypes and measurements, which reveals the need to study the minimum number of measurements for selecting superior Brazil nut genotypes for a production increase. Estimates of r by ANOVA were lower than those observed with the principal component methodology and close to AECOR. The CPCOV methodology provided the highest estimate of r, which resulted in a lower number of measurements needed to identify superior Brazil nut genotypes for the number of fruits and dry mass of seeds. Based on this methodology, three measurements are necessary to predict the true value of the Brazil nut genotypes with a minimum accuracy of 85%.
NASA Astrophysics Data System (ADS)
Martens, H. R.; Simons, M.; Moore, A. W.; Owen, S. E.; Rivera, L. A.
2016-12-01
We explore the contributions of oceanic, atmospheric, and hydrologic mass loading to Global Navigation Satellite System (GNSS)-inferred observations of surface displacements in Japan. Surface mass loading (SML) generates mm- to cm-level deformation of the solid Earth on time scales of hours to years, which exceeds the measurement uncertainties of most GNSS position estimates. By improving the efficiency and accuracy of the prediction and empirical estimation of SML response, we aim to reduce the variance of GNSS time series and therefore enhance the ability to resolve subtle tectonic signals, such as aseismic transients associated with subduction zone processes. Using the GIPSY software in precise point positioning mode, we estimate time series of sub-daily receiver positions for the GNSS Earth Observation Network System (GEONET) in Japan. We also model the Earth's elastic deformation response to a variety of surface mass loads, including loads of atmospheric (e.g., ECMWF) and oceanic (e.g., TPXO8-Atlas, ECCO2) origin. We extract periodic signals, such as the ocean tides and seasonal variations in hydrological loading, using harmonic analysis. Deformation caused by non-periodic loads, such as non-tidal oceanic and atmospheric loads, can be predicted and removed to further reduce the variance. We seek to streamline the workflow for estimating SML-induced surface displacements from a variety of sources in order to account for loading signals in routine GNSS data processing, thereby improving the ability to assess the mechanics of plate boundaries.
Individual differences in emotion word processing: A diffusion model analysis.
Mueller, Christina J; Kuchinke, Lars
2016-06-01
The exploratory study investigated individual differences in implicit processing of emotional words in a lexical decision task. A processing advantage for positive words was observed, and differences between happy and fear-related words in response times were predicted by individual differences in specific variables of emotion processing: Whereas more pronounced goal-directed behavior was related to a specific slowdown in processing of fear-related words, the rate of spontaneous eye blinks (indexing brain dopamine levels) was associated with a processing advantage of happy words. Estimating diffusion model parameters revealed that the drift rate (rate of information accumulation) captures unique variance of processing differences between happy and fear-related words, with highest drift rates observed for happy words. Overall emotion recognition ability predicted individual differences in drift rates between happy and fear-related words. The findings emphasize that a significant amount of variance in emotion processing is explained by individual differences in behavioral data.
Blinded sample size re-estimation in three-arm trials with 'gold standard' design.
Mütze, Tobias; Friede, Tim
2017-10-15
In this article, we study blinded sample size re-estimation in the 'gold standard' design with internal pilot study for normally distributed outcomes. The 'gold standard' design is a three-arm clinical trial design that includes an active and a placebo control in addition to an experimental treatment. We focus on the absolute margin approach to hypothesis testing in three-arm trials at which the non-inferiority of the experimental treatment and the assay sensitivity are assessed by pairwise comparisons. We compare several blinded sample size re-estimation procedures in a simulation study assessing operating characteristics including power and type I error. We find that sample size re-estimation based on the popular one-sample variance estimator results in overpowered trials. Moreover, sample size re-estimation based on unbiased variance estimators such as the Xing-Ganju variance estimator results in underpowered trials, as it is expected because an overestimation of the variance and thus the sample size is in general required for the re-estimation procedure to eventually meet the target power. To overcome this problem, we propose an inflation factor for the sample size re-estimation with the Xing-Ganju variance estimator and show that this approach results in adequately powered trials. Because of favorable features of the Xing-Ganju variance estimator such as unbiasedness and a distribution independent of the group means, the inflation factor does not depend on the nuisance parameter and, therefore, can be calculated prior to a trial. Moreover, we prove that the sample size re-estimation based on the Xing-Ganju variance estimator does not bias the effect estimate. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Variance to mean ratio, R(t), for poisson processes on phylogenetic trees.
Goldman, N
1994-09-01
The ratio of expected variance to mean, R(t), of numbers of DNA base substitutions for contemporary sequences related by a "star" phylogeny is widely seen as a measure of the adherence of the sequences' evolution to a Poisson process with a molecular clock, as predicted by the "neutral theory" of molecular evolution under certain conditions. A number of estimators of R(t) have been proposed, all predicted to have mean 1 and distributions based on the chi 2. Various genes have previously been analyzed and found to have values of R(t) far in excess of 1, calling into question important aspects of the neutral theory. In this paper, I use Monte Carlo simulation to show that the previously suggested means and distributions of estimators of R(t) are highly inaccurate. The analysis is applied to star phylogenies and to general phylogenetic trees, and well-known gene sequences are reanalyzed. For star phylogenies the results show that Kimura's estimators ("The Neutral Theory of Molecular Evolution," Cambridge Univ. Press, Cambridge, 1983) are unsatisfactory for statistical testing of R(t), but confirm the accuracy of Bulmer's correction factor (Genetics 123: 615-619, 1989). For all three nonstar phylogenies studied, attained values of all three estimators of R(t), although larger than 1, are within their true confidence limits under simple Poisson process models. This shows that lineage effects can be responsible for high estimates of R(t), restoring some limited confidence in the molecular clock and showing that the distinction between lineage and molecular clock effects is vital.(ABSTRACT TRUNCATED AT 250 WORDS)
Subjective frequency estimates for 2,938 monosyllabic words.
Balota, D A; Pilotti, M; Cortese, M J
2001-06-01
Subjective frequency estimates for large sample of monosyllabic English words were collected from 574 young adults (undergraduate students) and from a separate group of 1,590 adults of varying ages and educational backgrounds. Estimates from the latter group were collected via the internet. In addition, 90 healthy older adults provided estimates for a random sample of 480 of these words. All groups rated words with respect to the estimated frequency of encounters of each word on a 7-point scale, ranging from never encountered to encountered several times a day. The young and older groups also rated each word with respect to the frequency of encounters in different perceptual domains (e.g., reading, hearing, writing, or speaking). The results of regression analyses indicated that objective log frequency and meaningfulness accounted for most of the variance in subjective frequency estimates, whereas neighborhood size accounted for the least amount of variance in the ratings. The predictive power of log frequency and meaningfulness were dependent on the level of subjective frequency estimates. Meaningfulness was a better predictor of subjective frequency for uncommon words, whereas log frequency was a better predictor of subjective frequency for common words. Our discussion focuses on the utility of subjective frequency estimates compared with other estimates of familiarity. The raw subjective frequency data for all words are available at http://www.artsci.wustl.edu/dbalota/labpub.html.
Planning additional drilling campaign using two-space genetic algorithm: A game theoretical approach
NASA Astrophysics Data System (ADS)
Kumral, Mustafa; Ozer, Umit
2013-03-01
Grade and tonnage are the most important technical uncertainties in mining ventures because of the use of estimations/simulations, which are mostly generated from drill data. Open pit mines are planned and designed on the basis of the blocks representing the entire orebody. Each block has different estimation/simulation variance reflecting uncertainty to some extent. The estimation/simulation realizations are submitted to mine production scheduling process. However, the use of a block model with varying estimation/simulation variances will lead to serious risk in the scheduling. In the medium of multiple simulations, the dispersion variances of blocks can be thought to regard technical uncertainties. However, the dispersion variance cannot handle uncertainty associated with varying estimation/simulation variances of blocks. This paper proposes an approach that generates the configuration of the best additional drilling campaign to generate more homogenous estimation/simulation variances of blocks. In other words, the objective is to find the best drilling configuration in such a way as to minimize grade uncertainty under budget constraint. Uncertainty measure of the optimization process in this paper is interpolation variance, which considers data locations and grades. The problem is expressed as a minmax problem, which focuses on finding the best worst-case performance i.e., minimizing interpolation variance of the block generating maximum interpolation variance. Since the optimization model requires computing the interpolation variances of blocks being simulated/estimated in each iteration, the problem cannot be solved by standard optimization tools. This motivates to use two-space genetic algorithm (GA) approach to solve the problem. The technique has two spaces: feasible drill hole configuration with minimization of interpolation variance and drill hole simulations with maximization of interpolation variance. Two-space interacts to find a minmax solution iteratively. A case study was conducted to demonstrate the performance of approach. The findings showed that the approach could be used to plan a new drilling campaign.
Predicting Individual Fuel Economy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Zhenhong; Greene, David L
2011-01-01
To make informed decisions about travel and vehicle purchase, consumers need unbiased and accurate information of the fuel economy they will actually obtain. In the past, the EPA fuel economy estimates based on its 1984 rules have been widely criticized for overestimating on-road fuel economy. In 2008, EPA adopted a new estimation rule. This study compares the usefulness of the EPA's 1984 and 2008 estimates based on their prediction bias and accuracy and attempts to improve the prediction of on-road fuel economies based on consumer and vehicle attributes. We examine the usefulness of the EPA fuel economy estimates using amore » large sample of self-reported on-road fuel economy data and develop an Individualized Model for more accurately predicting an individual driver's on-road fuel economy based on easily determined vehicle and driver attributes. Accuracy rather than bias appears to have limited the usefulness of the EPA 1984 estimates in predicting on-road MPG. The EPA 2008 estimates appear to be equally inaccurate and substantially more biased relative to the self-reported data. Furthermore, the 2008 estimates exhibit an underestimation bias that increases with increasing fuel economy, suggesting that the new numbers will tend to underestimate the real-world benefits of fuel economy and emissions standards. By including several simple driver and vehicle attributes, the Individualized Model reduces the unexplained variance by over 55% and the standard error by 33% based on an independent test sample. The additional explanatory variables can be easily provided by the individuals.« less
Inventory implications of using sampling variances in estimation of growth model coefficients
Albert R. Stage; William R. Wykoff
2000-01-01
Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...
A Proposal for Phase 4 of the Forest Inventory and Analysis Program
Ronald E. McRoberts
2005-01-01
Maps of forest cover were constructed using observations from forest inventory plots, Landsat Thematic Mapper satellite imagery, and a logistic regression model. Estimates of mean proportion forest area and the variance of the mean were calculated for circular study areas with radii ranging from 1 km to 15 km. The spatial correlation among pixel predictions was...
The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling
Wray, Naomi R.; Yang, Jian; Goddard, Michael E.; Visscher, Peter M.
2010-01-01
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator. PMID:20195508
Measurement error in epidemiologic studies of air pollution based on land-use regression models.
Basagaña, Xavier; Aguilera, Inmaculada; Rivera, Marcela; Agis, David; Foraster, Maria; Marrugat, Jaume; Elosua, Roberto; Künzli, Nino
2013-10-15
Land-use regression (LUR) models are increasingly used to estimate air pollution exposure in epidemiologic studies. These models use air pollution measurements taken at a small set of locations and modeling based on geographical covariates for which data are available at all study participant locations. The process of LUR model development commonly includes a variable selection procedure. When LUR model predictions are used as explanatory variables in a model for a health outcome, measurement error can lead to bias of the regression coefficients and to inflation of their variance. In previous studies dealing with spatial predictions of air pollution, bias was shown to be small while most of the effect of measurement error was on the variance. In this study, we show that in realistic cases where LUR models are applied to health data, bias in health-effect estimates can be substantial. This bias depends on the number of air pollution measurement sites, the number of available predictors for model selection, and the amount of explainable variability in the true exposure. These results should be taken into account when interpreting health effects from studies that used LUR models.
Kaplowitz, Stan A; Perlstadt, Harry; D'Onofrio, Gail; Melnick, Edward R; Baum, Carl R; Kirrane, Barbara M; Post, Lori A
2012-01-01
We derived a clinical decision rule for determining which young children need testing for lead poisoning. We developed an equation that combines lead exposure self-report questions with the child's census-block housing and socioeconomic characteristics, personal demographic characteristics, and Medicaid status. This equation better predicts elevated blood lead level (EBLL) than one using ZIP code and Medicaid status. A survey regarding potential lead exposure was administered from October 2001 to January 2003 to Michigan parents at pediatric clinics (n=3,396). These self-report survey data were linked to a statewide clinical registry of blood lead level (BLL) tests. Sensitivity and specificity were calculated and then used to estimate the cost-effectiveness of the equation. The census-block group prediction equation explained 18.1% of the variance in BLLs. Replacing block group characteristics with the self-report questions and dichotomized ZIP code risk explained only 12.6% of the variance. Adding three self-report questions to the census-block group model increased the variance explained to 19.9% and increased specificity with no loss in sensitivity in detecting EBLLs of ≥ 10 micrograms per deciliter. Relying solely on self-reports of lead exposure predicted BLL less effectively than the block group model. However, adding three of 13 self-report questions to our clinical decision rule significantly improved prediction of which children require a BLL test. Using the equation as the clinical decision rule would annually eliminate more than 7,200 unnecessary tests in Michigan and save more than $220,000.
Development and Validation of a New Air Carrier Block Time Prediction Model and Methodology
NASA Astrophysics Data System (ADS)
Litvay, Robyn Olson
Commercial airline operations rely on predicted block times as the foundation for critical, successive decisions that include fuel purchasing, crew scheduling, and airport facility usage planning. Small inaccuracies in the predicted block times have the potential to result in huge financial losses, and, with profit margins for airline operations currently almost nonexistent, potentially negate any possible profit. Although optimization techniques have resulted in many models targeting airline operations, the challenge of accurately predicting and quantifying variables months in advance remains elusive. The objective of this work is the development of an airline block time prediction model and methodology that is practical, easily implemented, and easily updated. Research was accomplished, and actual U.S., domestic, flight data from a major airline was utilized, to develop a model to predict airline block times with increased accuracy and smaller variance in the actual times from the predicted times. This reduction in variance represents tens of millions of dollars (U.S.) per year in operational cost savings for an individual airline. A new methodology for block time prediction is constructed using a regression model as the base, as it has both deterministic and probabilistic components, and historic block time distributions. The estimation of the block times for commercial, domestic, airline operations requires a probabilistic, general model that can be easily customized for a specific airline’s network. As individual block times vary by season, by day, and by time of day, the challenge is to make general, long-term estimations representing the average, actual block times while minimizing the variation. Predictions of block times for the third quarter months of July and August of 2011 were calculated using this new model. The resulting, actual block times were obtained from the Research and Innovative Technology Administration, Bureau of Transportation Statistics (Airline On-time Performance Data, 2008-2011) for comparison and analysis. Future block times are shown to be predicted with greater accuracy, without exception and network-wide, for a major, U.S., domestic airline.
Methods to Estimate the Variance of Some Indices of the Signal Detection Theory: A Simulation Study
ERIC Educational Resources Information Center
Suero, Manuel; Privado, Jesús; Botella, Juan
2017-01-01
A simulation study is presented to evaluate and compare three methods to estimate the variance of the estimates of the parameters d and "C" of the signal detection theory (SDT). Several methods have been proposed to calculate the variance of their estimators, "d'" and "c." Those methods have been mostly assessed by…
Uncertainty Estimation using Bootstrapped Kriging Predictions for Precipitation Isoscapes
NASA Astrophysics Data System (ADS)
Ma, C.; Bowen, G. J.; Vander Zanden, H.; Wunder, M.
2017-12-01
Isoscapes are spatial models representing the distribution of stable isotope values across landscapes. Isoscapes of hydrogen and oxygen in precipitation are now widely used in a diversity of fields, including geology, biology, hydrology, and atmospheric science. To generate isoscapes, geostatistical methods are typically applied to extend predictions from limited data measurements. Kriging is a popular method in isoscape modeling, but quantifying the uncertainty associated with the resulting isoscapes is challenging. Applications that use precipitation isoscapes to determine sample origin require estimation of uncertainty. Here we present a simple bootstrap method (SBM) to estimate the mean and uncertainty of the krigged isoscape and compare these results with a generalized bootstrap method (GBM) applied in previous studies. We used hydrogen isotopic data from IsoMAP to explore these two approaches for estimating uncertainty. We conducted 10 simulations for each bootstrap method and found that SBM results in more kriging predictions (9/10) compared to GBM (4/10). Prediction from SBM was closer to the original prediction generated without bootstrapping and had less variance than GBM. SBM was tested on different datasets from IsoMAP with different numbers of observation sites. We determined that predictions from the datasets with fewer than 40 observation sites using SBM were more variable than the original prediction. The approaches we used for estimating uncertainty will be compiled in an R package that is under development. We expect that these robust estimates of precipitation isoscape uncertainty can be applied in diagnosing the origin of samples ranging from various type of waters to migratory animals, food products, and humans.
Global Genetic Variations Predict Brain Response to Faces
Dickie, Erin W.; Tahmasebi, Amir; French, Leon; Kovacevic, Natasa; Banaschewski, Tobias; Barker, Gareth J.; Bokde, Arun; Büchel, Christian; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Gallinat, Juergen; Gowland, Penny; Heinz, Andreas; Ittermann, Bernd; Lawrence, Claire; Mann, Karl; Martinot, Jean-Luc; Nees, Frauke; Nichols, Thomas; Lathrop, Mark; Loth, Eva; Pausova, Zdenka; Rietschel, Marcela; Smolka, Michal N.; Ströhle, Andreas; Toro, Roberto; Schumann, Gunter; Paus, Tomáš
2014-01-01
Face expressions are a rich source of social signals. Here we estimated the proportion of phenotypic variance in the brain response to facial expressions explained by common genetic variance captured by ∼500,000 single nucleotide polymorphisms. Using genomic-relationship-matrix restricted maximum likelihood (GREML), we related this global genetic variance to that in the brain response to facial expressions, as assessed with functional magnetic resonance imaging (fMRI) in a community-based sample of adolescents (n = 1,620). Brain response to facial expressions was measured in 25 regions constituting a face network, as defined previously. In 9 out of these 25 regions, common genetic variance explained a significant proportion of phenotypic variance (40–50%) in their response to ambiguous facial expressions; this was not the case for angry facial expressions. Across the network, the strength of the genotype-phenotype relationship varied as a function of the inter-individual variability in the number of functional connections possessed by a given region (R2 = 0.38, p<0.001). Furthermore, this variability showed an inverted U relationship with both the number of observed connections (R2 = 0.48, p<0.001) and the magnitude of brain response (R2 = 0.32, p<0.001). Thus, a significant proportion of the brain response to facial expressions is predicted by common genetic variance in a subset of regions constituting the face network. These regions show the highest inter-individual variability in the number of connections with other network nodes, suggesting that the genetic model captures variations across the adolescent brains in co-opting these regions into the face network. PMID:25122193
McGarvey, Richard; Burch, Paul; Matthews, Janet M
2016-01-01
Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic field sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard one-start aligned systematic survey design, a uniform 10 x 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fifth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to five times larger. Organisms being restricted to patches of habitat was alone sufficient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confidence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators (v) that corrected for inter-transect correlation (ν₈ and ν(W)) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two "post-stratification" variance estimators (ν₂ and ν₃) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, ν₈ and ν(W) again being the best performers in the longer-range autocorrelated populations. However, no systematic variance estimators tested were free from bias. On balance, systematic designs bring more narrow confidence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confidence interval. The search continues for better estimators of sampling variance for the systematic survey mean.
Corron, Louise; Marchal, François; Condemi, Silvana; Telmon, Norbert; Chaumoitre, Kathia; Adalian, Pascal
2018-05-31
Subadult age estimation should rely on sampling and statistical protocols capturing development variability for more accurate age estimates. In this perspective, measurements were taken on the fifth lumbar vertebrae and/or clavicles of 534 French males and females aged 0-19 years and the ilia of 244 males and females aged 0-12 years. These variables were fitted in nonparametric multivariate adaptive regression splines (MARS) models with 95% prediction intervals (PIs) of age. The models were tested on two independent samples from Marseille and the Luis Lopes reference collection from Lisbon. Models using ilium width and module, maximum clavicle length, and lateral vertebral body heights were more than 92% accurate. Precision was lower for postpubertal individuals. Integrating punctual nonlinearities of the relationship between age and the variables and dynamic prediction intervals incorporated the normal increase in interindividual growth variability (heteroscedasticity of variance) with age for more biologically accurate predictions. © 2018 American Academy of Forensic Sciences.
Ocean eddies and climate predictability
NASA Astrophysics Data System (ADS)
Kirtman, Ben P.; Perlin, Natalie; Siqueira, Leo
2017-12-01
A suite of coupled climate model simulations and experiments are used to examine how resolved mesoscale ocean features affect aspects of climate variability, air-sea interactions, and predictability. In combination with control simulations, experiments with the interactive ensemble coupling strategy are used to further amplify the role of the oceanic mesoscale field and the associated air-sea feedbacks and predictability. The basic intent of the interactive ensemble coupling strategy is to reduce the atmospheric noise at the air-sea interface, allowing an assessment of how noise affects the variability, and in this case, it is also used to diagnose predictability from the perspective of signal-to-noise ratios. The climate variability is assessed from the perspective of sea surface temperature (SST) variance ratios, and it is shown that, unsurprisingly, mesoscale variability significantly increases SST variance. Perhaps surprising is the fact that the presence of mesoscale ocean features even further enhances the SST variance in the interactive ensemble simulation beyond what would be expected from simple linear arguments. Changes in the air-sea coupling between simulations are assessed using pointwise convective rainfall-SST and convective rainfall-SST tendency correlations and again emphasize how the oceanic mesoscale alters the local association between convective rainfall and SST. Understanding the possible relationships between the SST-forced signal and the weather noise is critically important in climate predictability. We use the interactive ensemble simulations to diagnose this relationship, and we find that the presence of mesoscale ocean features significantly enhances this link particularly in ocean eddy rich regions. Finally, we use signal-to-noise ratios to show that the ocean mesoscale activity increases model estimated predictability in terms of convective precipitation and atmospheric upper tropospheric circulation.
Ocean eddies and climate predictability.
Kirtman, Ben P; Perlin, Natalie; Siqueira, Leo
2017-12-01
A suite of coupled climate model simulations and experiments are used to examine how resolved mesoscale ocean features affect aspects of climate variability, air-sea interactions, and predictability. In combination with control simulations, experiments with the interactive ensemble coupling strategy are used to further amplify the role of the oceanic mesoscale field and the associated air-sea feedbacks and predictability. The basic intent of the interactive ensemble coupling strategy is to reduce the atmospheric noise at the air-sea interface, allowing an assessment of how noise affects the variability, and in this case, it is also used to diagnose predictability from the perspective of signal-to-noise ratios. The climate variability is assessed from the perspective of sea surface temperature (SST) variance ratios, and it is shown that, unsurprisingly, mesoscale variability significantly increases SST variance. Perhaps surprising is the fact that the presence of mesoscale ocean features even further enhances the SST variance in the interactive ensemble simulation beyond what would be expected from simple linear arguments. Changes in the air-sea coupling between simulations are assessed using pointwise convective rainfall-SST and convective rainfall-SST tendency correlations and again emphasize how the oceanic mesoscale alters the local association between convective rainfall and SST. Understanding the possible relationships between the SST-forced signal and the weather noise is critically important in climate predictability. We use the interactive ensemble simulations to diagnose this relationship, and we find that the presence of mesoscale ocean features significantly enhances this link particularly in ocean eddy rich regions. Finally, we use signal-to-noise ratios to show that the ocean mesoscale activity increases model estimated predictability in terms of convective precipitation and atmospheric upper tropospheric circulation.
Seasonal Predictability in a Model Atmosphere.
NASA Astrophysics Data System (ADS)
Lin, Hai
2001-07-01
The predictability of atmospheric mean-seasonal conditions in the absence of externally varying forcing is examined. A perfect-model approach is adopted, in which a global T21 three-level quasigeostrophic atmospheric model is integrated over 21 000 days to obtain a reference atmospheric orbit. The model is driven by a time-independent forcing, so that the only source of time variability is the internal dynamics. The forcing is set to perpetual winter conditions in the Northern Hemisphere (NH) and perpetual summer in the Southern Hemisphere.A significant temporal variability in the NH 90-day mean states is observed. The component of that variability associated with the higher-frequency motions, or climate noise, is estimated using a method developed by Madden. In the polar region, and to a lesser extent in the midlatitudes, the temporal variance of the winter means is significantly greater than the climate noise, suggesting some potential predictability in those regions.Forecast experiments are performed to see whether the presence of variance in the 90-day mean states that is in excess of the climate noise leads to some skill in the prediction of these states. Ensemble forecast experiments with nine members starting from slightly different initial conditions are performed for 200 different 90-day means along the reference atmospheric orbit. The serial correlation between the ensemble means and the reference orbit shows that there is skill in the 90-day mean predictions. The skill is concentrated in those regions of the NH that have the largest variance in excess of the climate noise. An EOF analysis shows that nearly all the predictive skill in the seasonal means is associated with one mode of variability with a strong axisymmetric component.
Rhodes, Kirsty M; Turner, Rebecca M; Higgins, Julian P T
2015-01-01
Estimation of between-study heterogeneity is problematic in small meta-analyses. Bayesian meta-analysis is beneficial because it allows incorporation of external evidence on heterogeneity. To facilitate this, we provide empirical evidence on the likely heterogeneity between studies in meta-analyses relating to specific research settings. Our analyses included 6,492 continuous-outcome meta-analyses within the Cochrane Database of Systematic Reviews. We investigated the influence of meta-analysis settings on heterogeneity by modeling study data from all meta-analyses on the standardized mean difference scale. Meta-analysis setting was described according to outcome type, intervention comparison type, and medical area. Predictive distributions for between-study variance expected in future meta-analyses were obtained, which can be used directly as informative priors. Among outcome types, heterogeneity was found to be lowest in meta-analyses of obstetric outcomes. Among intervention comparison types, heterogeneity was lowest in meta-analyses comparing two pharmacologic interventions. Predictive distributions are reported for different settings. In two example meta-analyses, incorporating external evidence led to a more precise heterogeneity estimate. Heterogeneity was influenced by meta-analysis characteristics. Informative priors for between-study variance were derived for each specific setting. Our analyses thus assist the incorporation of realistic prior information into meta-analyses including few studies. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Nazarian, Alireza; Gezan, Salvador A
2016-03-01
The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L
2016-12-01
Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.
Is Romantic Desire Predictable? Machine Learning Applied to Initial Romantic Attraction.
Joel, Samantha; Eastwick, Paul W; Finkel, Eli J
2017-10-01
Matchmaking companies and theoretical perspectives on close relationships suggest that initial attraction is, to some extent, a product of two people's self-reported traits and preferences. We used machine learning to test how well such measures predict people's overall tendencies to romantically desire other people (actor variance) and to be desired by other people (partner variance), as well as people's desire for specific partners above and beyond actor and partner variance (relationship variance). In two speed-dating studies, romantically unattached individuals completed more than 100 self-report measures about traits and preferences that past researchers have identified as being relevant to mate selection. Each participant met each opposite-sex participant attending a speed-dating event for a 4-min speed date. Random forests models predicted 4% to 18% of actor variance and 7% to 27% of partner variance; crucially, however, they were unable to predict relationship variance using any combination of traits and preferences reported before the dates. These results suggest that compatibility elements of human mating are challenging to predict before two people meet.
Stratum variance estimation for sample allocation in crop surveys. [Great Plains Corridor
NASA Technical Reports Server (NTRS)
Perry, C. R., Jr.; Chhikara, R. S. (Principal Investigator)
1980-01-01
The problem of determining stratum variances needed in achieving an optimum sample allocation for crop surveys by remote sensing is investigated by considering an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical crop statistics is developed for obtaining initial estimates of tratum variances. The procedure is applied to estimate stratum variances for wheat in the U.S. Great Plains and is evaluated based on the numerical results thus obtained. It is shown that the proposed technique is viable and performs satisfactorily, with the use of a conservative value for the field size and the crop statistics from the small political subdivision level, when the estimated stratum variances were compared to those obtained using the LANDSAT data.
Wientjes, Yvonne C J; Bijma, Piter; Vandenplas, Jérémie; Calus, Mario P L
2017-10-01
Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations. Copyright © 2017 by the Genetics Society of America.
ERIC Educational Resources Information Center
Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S.
2016-01-01
The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…
Updating the Standard Spatial Observer for Contrast Detection
NASA Technical Reports Server (NTRS)
Ahumada, Albert J.; Watson, Andrew B.
2011-01-01
Watson and Ahmuada (2005) constructed a Standard Spatial Observer (SSO) model for foveal luminance contrast signal detection based on the Medelfest data (Watson, 1999). Here we propose two changes to the model, dropping the oblique effect from the CSF and using the cone density data of Curcio et al. (1990) to estimate the variation of sensitivity with eccentricity. Dropping the complex images, and using medians to exclude outlier data points, the SSO model now accounts for essentially all the predictable variance in the data, with an RMS prediction error of only 0.67 dB.
Ex Post Facto Monte Carlo Variance Reduction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Booth, Thomas E.
The variance in Monte Carlo particle transport calculations is often dominated by a few particles whose importance increases manyfold on a single transport step. This paper describes a novel variance reduction method that uses a large importance change as a trigger to resample the offending transport step. That is, the method is employed only after (ex post facto) a random walk attempts a transport step that would otherwise introduce a large variance in the calculation.Improvements in two Monte Carlo transport calculations are demonstrated empirically using an ex post facto method. First, the method is shown to reduce the variance inmore » a penetration problem with a cross-section window. Second, the method empirically appears to modify a point detector estimator from an infinite variance estimator to a finite variance estimator.« less
Survival estimation and the effects of dependency among animals
Schmutz, Joel A.; Ward, David H.; Sedinger, James S.; Rexstad, Eric A.
1995-01-01
Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant, we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities.
Multi-objective Optimization of Solar Irradiance and Variance at Pertinent Inclination Angles
NASA Astrophysics Data System (ADS)
Jain, Dhanesh; Lalwani, Mahendra
2018-05-01
The performance of photovoltaic panel gets highly affected bychange in atmospheric conditions and angle of inclination. This article evaluates the optimum tilt angle and orientation angle (surface azimuth angle) for solar photovoltaic array in order to get maximum solar irradiance and to reduce variance of radiation at different sets or subsets of time periods. Non-linear regression and adaptive neural fuzzy interference system (ANFIS) methods are used for predicting the solar radiation. The results of ANFIS are more accurate in comparison to non-linear regression. These results are further used for evaluating the correlation and applied for estimating the optimum combination of tilt angle and orientation angle with the help of general algebraic modelling system and multi-objective genetic algorithm. The hourly average solar irradiation is calculated at different combinations of tilt angle and orientation angle with the help of horizontal surface radiation data of Jodhpur (Rajasthan, India). The hourly average solar irradiance is calculated for three cases: zero variance, with actual variance and with double variance at different time scenarios. It is concluded that monthly collected solar radiation produces better result as compared to bimonthly, seasonally, half-yearly and yearly collected solar radiation. The profit obtained for monthly varying angle has 4.6% more with zero variance and 3.8% more with actual variance, than the annually fixed angle.
Zhang, Yongsheng; Wei, Heng; Zheng, Kangning
2017-01-01
Considering that metro network expansion brings us with more alternative routes, it is attractive to integrate the impacts of routes set and the interdependency among alternative routes on route choice probability into route choice modeling. Therefore, the formulation, estimation and application of a constrained multinomial probit (CMNP) route choice model in the metro network are carried out in this paper. The utility function is formulated as three components: the compensatory component is a function of influencing factors; the non-compensatory component measures the impacts of routes set on utility; following a multivariate normal distribution, the covariance of error component is structured into three parts, representing the correlation among routes, the transfer variance of route, and the unobserved variance respectively. Considering multidimensional integrals of the multivariate normal probability density function, the CMNP model is rewritten as Hierarchical Bayes formula and M-H sampling algorithm based Monte Carlo Markov Chain approach is constructed to estimate all parameters. Based on Guangzhou Metro data, reliable estimation results are gained. Furthermore, the proposed CMNP model also shows a good forecasting performance for the route choice probabilities calculation and a good application performance for transfer flow volume prediction. PMID:28591188
Reiners, William A.; Liu, S.; Gerow, K.G.; Keller, M.; Schimel, D.S.
2002-01-01
[1] The humid tropical zone is a major source area for N2O and NO emissions to the atmosphere. Local emission rates vary widely with local conditions, particularly land use practices which swiftly change with expanding settlement and changing market conditions. The combination of wide variation in emission rates and rapidly changing land use make regional estimation and future prediction of biogenic trace gas emission particularly difficult. This study estimates contemporary, historical, and future N2O and NO emissions from 0.5 million ha of northeastern Costa Rica, a well-documented region in the wet tropics undergoing rapid agricultural development. Estimates were derived by linking spatially distributed environmental data with an ecosystem simulation model in an ensemble estimation approach that incorporates the variance and covariance of spatially distributed driving variables. Results include measures of variance for regional emissions. The formation and aging of pastures from forest provided most of the past temporal change in N2O and NO flux in this region; future changes will be controlled by the degree of nitrogen fertilizer application and extent of intensively managed croplands.
NASA Astrophysics Data System (ADS)
Reiners, W. A.; Liu, S.; Gerow, K. G.; Keller, M.; Schimel, D. S.
2002-12-01
The humid tropical zone is a major source area for N2O and NO emissions to the atmosphere. Local emission rates vary widely with local conditions, particularly land use practices which swiftly change with expanding settlement and changing market conditions. The combination of wide variation in emission rates and rapidly changing land use make regional estimation and future prediction of biogenic trace gas emission particularly difficult. This study estimates contemporary, historical, and future N2O and NO emissions from 0.5 million ha of northeastern Costa Rica, a well-documented region in the wet tropics undergoing rapid agricultural development. Estimates were derived by linking spatially distributed environmental data with an ecosystem simulation model in an ensemble estimation approach that incorporates the variance and covariance of spatially distributed driving variables. Results include measures of variance for regional emissions. The formation and aging of pastures from forest provided most of the past temporal change in N2O and NO flux in this region; future changes will be controlled by the degree of nitrogen fertilizer application and extent of intensively managed croplands.
Soydan, Lydia C.; Kellihan, Heidi B.; Bates, Melissa L.; Stepien, Rebecca L.; Consigny, Daniel W.; Bellofiore, Alessandro; Francois, Christopher J.; Chesler, Naomi C.
2015-01-01
Objectives To compare noninvasive estimates of pulmonary artery pressure (PAP) obtained via echocardiography (ECHO) to invasive measurements of PAP obtained during right heart catheterization (RHC) across a wide range of PAP, to examine the accuracy of estimating right atrial pressure via ECHO (RAPECHO) compared to RAP measured by catheterization (RAPRHC), and to determine if adding RAPECHO improves the accuracy of noninvasive PAP estimations. Animals Fourteen healthy female beagle dogs. Methods ECHO and RHC performed at various data collection points, both at normal PAP and increased PAP (generated by microbead embolization). Results Noninvasive estimates of PAP were moderately but significantly correlated with invasive measurements of PAP. A high degree of variance was noted for all estimations, with increased variance at higher PAP. The addition of RAPECHO improved correlation and bias in all cases. RAPRHC was significantly correlated with RAPECHO and with subjectively assessed right atrial size (RA sizesubj). Conclusions Spectral Doppler assessments of tricuspid and pulmonic regurgitation are imperfect methods for predicting PAP as measured by catheterization despite an overall moderate correlation between invasive and noninvasive values. Noninvasive measurements may be better utilized as part of a comprehensive assessment of PAP in canine patients. RAPRHC appears best estimated based on subjective assessment of RA size. Including estimated RAPECHO in estimates of PAP improves the correlation and relatedness between noninvasive and invasive measures of PAP, but notable variability in accuracy of estimations persists. PMID:25601540
Variance and covariance estimates for weaning weight of Senepol cattle.
Wright, D W; Johnson, Z B; Brown, C J; Wildeus, S
1991-10-01
Variance and covariance components were estimated for weaning weight from Senepol field data for use in the reduced animal model for a maternally influenced trait. The 4,634 weaning records were used to evaluate 113 sires and 1,406 dams on the island of St. Croix. Estimates of direct additive genetic variance (sigma 2A), maternal additive genetic variance (sigma 2M), covariance between direct and maternal additive genetic effects (sigma AM), permanent maternal environmental variance (sigma 2PE), and residual variance (sigma 2 epsilon) were calculated by equating variances estimated from a sire-dam model and a sire-maternal grandsire model, with and without the inverse of the numerator relationship matrix (A-1), to their expectations. Estimates were sigma 2A, 139.05 and 138.14 kg2; sigma 2M, 307.04 and 288.90 kg2; sigma AM, -117.57 and -103.76 kg2; sigma 2PE, -258.35 and -243.40 kg2; and sigma 2 epsilon, 588.18 and 577.72 kg2 with and without A-1, respectively. Heritability estimates for direct additive (h2A) were .211 and .210 with and without A-1, respectively. Heritability estimates for maternal additive (h2M) were .47 and .44 with and without A-1, respectively. Correlations between direct and maternal (IAM) effects were -.57 and -.52 with and without A-1, respectively.
Cross-bispectrum computation and variance estimation
NASA Technical Reports Server (NTRS)
Lii, K. S.; Helland, K. N.
1981-01-01
A method for the estimation of cross-bispectra of discrete real time series is developed. The asymptotic variance properties of the bispectrum are reviewed, and a method for the direct estimation of bispectral variance is given. The symmetry properties are described which minimize the computations necessary to obtain a complete estimate of the cross-bispectrum in the right-half-plane. A procedure is given for computing the cross-bispectrum by subdividing the domain into rectangular averaging regions which help reduce the variance of the estimates and allow easy application of the symmetry relationships to minimize the computational effort. As an example of the procedure, the cross-bispectrum of a numerically generated, exponentially distributed time series is computed and compared with theory.
Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K.; Thomas, Venetta; Ambrosone, Christine B.; Bandera, Elisa V.; Berndt, Sonja I.; Bernstein, Leslie; Blot, William J.; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J.; Cheng, Iona; Chu, Lisa; Deming, Sandra L.; Driver, W. Ryan; Goodman, Phyllis; Hayes, Richard B.; Hennis, Anselm J. M.; Hsing, Ann W.; Hu, Jennifer J.; Ingles, Sue A.; John, Esther M.; Kittles, Rick A.; Kolb, Suzanne; Leske, M. Cristina; Monroe, Kristine R.; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A; Press, Michael F.; Rodriguez-Gil, Jorge L.; Rybicki, Ben A.; Schumacher, Fredrick; Stanford, Janet L.; Signorello, Lisa B.; Strom, Sara S.; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S.; Wu, Suh-Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G.; Stram, Alexander H.; Kolonel, Laurence N.; Marchand, Loïc Le; Henderson, Brian E.; Haiman, Christopher A.; Stram, Daniel O.
2015-01-01
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious. PMID:26125186
Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K; Thomas, Venetta; Ambrosone, Christine B; Bandera, Elisa V; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J; Cheng, Iona; Chu, Lisa; Deming, Sandra L; Driver, W Ryan; Goodman, Phyllis; Hayes, Richard B; Hennis, Anselm J M; Hsing, Ann W; Hu, Jennifer J; Ingles, Sue A; John, Esther M; Kittles, Rick A; Kolb, Suzanne; Leske, M Cristina; Millikan, Robert C; Monroe, Kristine R; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A; Press, Michael F; Rodriguez-Gil, Jorge L; Rybicki, Ben A; Schumacher, Fredrick; Stanford, Janet L; Signorello, Lisa B; Strom, Sara S; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S; Wu, Suh-Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G; Stram, Alexander H; Kolonel, Laurence N; Le Marchand, Loïc; Henderson, Brian E; Haiman, Christopher A; Stram, Daniel O
2015-01-01
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.
Gutreuter, S.; Boogaard, M.A.
2007-01-01
Predictors of the percentile lethal/effective concentration/dose are commonly used measures of efficacy and toxicity. Typically such quantal-response predictors (e.g., the exposure required to kill 50% of some population) are estimated from simple bioassays wherein organisms are exposed to a gradient of several concentrations of a single agent. The toxicity of an agent may be influenced by auxiliary covariates, however, and more complicated experimental designs may introduce multiple variance components. Prediction methods lag examples of those cases. A conventional two-stage approach consists of multiple bivariate predictions of, say, medial lethal concentration followed by regression of those predictions on the auxiliary covariates. We propose a more effective and parsimonious class of generalized nonlinear mixed-effects models for prediction of lethal/effective dose/concentration from auxiliary covariates. We demonstrate examples using data from a study regarding the effects of pH and additions of variable quantities 2???,5???-dichloro-4???- nitrosalicylanilide (niclosamide) on the toxicity of 3-trifluoromethyl-4- nitrophenol to larval sea lamprey (Petromyzon marinus). The new models yielded unbiased predictions and root-mean-squared errors (RMSEs) of prediction for the exposure required to kill 50 and 99.9% of some population that were 29 to 82% smaller, respectively, than those from the conventional two-stage procedure. The model class is flexible and easily implemented using commonly available software. ?? 2007 SETAC.
Methods to Estimate the Between-Study Variance and Its Uncertainty in Meta-Analysis
ERIC Educational Resources Information Center
Veroniki, Areti Angeliki; Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian P. T.; Langan, Dean; Salanti, Georgia
2016-01-01
Meta-analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between-study variability, which is typically modelled using a between-study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between-study variance,…
An Analysis of Variance Approach for the Estimation of Response Time Distributions in Tests
ERIC Educational Resources Information Center
Attali, Yigal
2010-01-01
Generalizability theory and analysis of variance methods are employed, together with the concept of objective time pressure, to estimate response time distributions and the degree of time pressure in timed tests. By estimating response time variance components due to person, item, and their interaction, and fixed effects due to item types and…
Relationships of Measurement Error and Prediction Error in Observed-Score Regression
ERIC Educational Resources Information Center
Moses, Tim
2012-01-01
The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
Predicting research use in nursing organizations: a multilevel analysis.
Estabrooks, Carole A; Midodzi, William K; Cummings, Greta G; Wallin, Lars
2007-01-01
No empirical literature was found that explained how organizational context (operationalized as a composite of leadership, culture, and evaluation) influences research utilization. Similarly, no work was found on the interaction of individuals and contextual factors, or the relative importance or contribution of forces at different organizational levels to either such proposed interactions or, ultimately, to research utilization. To determine independent factors that predict research utilization among nurses, taking into account influences at individual nurse, specialty, and hospital levels. Cross-sectional survey data for 4,421 registered nurses in Alberta, Canada were used in a series of multilevel (three levels) modeling analyses to predict research utilization. A multilevel model was developed in MLwiN version 2.0 and used to: (a) estimate simultaneous effects of several predictors and (b) quantify the amount of explained variance in research utilization that could be apportioned to individual, specialty, and hospital levels. There was significant variation in research utilization (p <.05). Factors (remaining in the final model at statistically significant levels) found to predict more research utilization at the three levels of analysis were as follows. At the individual nurse level (Level 1): time spent on the Internet and lower levels of emotional exhaustion. At the specialty level (Level 2): facilitation, nurse-to-nurse collaboration, a higher context (i.e., of nursing culture, leadership, and evaluation), and perceived ability to control policy. At the hospital level (Level 3): only hospital size was significant in the final model. The total variance in research utilization was 1.04, and the intraclass correlations (the percent contribution by contextual factors) were 4% (variance = 0.04, p <.01) at the hospital level and 8% (variance = 0.09, p <.05) at the specialty level. The contribution attributable to individual factors alone was 87% (variance = 0.91, p <.01). Variation in research utilization was explained mainly by differences in individual characteristics, with specialty- and organizational-level factors contributing relatively little by comparison. Among hospital-level factors, hospital size was the only significant determinant of research utilization. Although organizational determinants explained less variance in the model, they were still statistically significant when analyzed alone. These findings suggest that investigations into mechanisms that influence research utilization must address influences at multiple levels of the organization. Such investigations will require careful attention to both methodological and interpretative challenges present when dealing with multiple units of analysis.
One-shot estimate of MRMC variance: AUC.
Gallas, Brandon D
2006-03-01
One popular study design for estimating the area under the receiver operating characteristic curve (AUC) is the one in which a set of readers reads a set of cases: a fully crossed design in which every reader reads every case. The variability of the subsequent reader-averaged AUC has two sources: the multiple readers and the multiple cases (MRMC). In this article, we present a nonparametric estimate for the variance of the reader-averaged AUC that is unbiased and does not use resampling tools. The one-shot estimate is based on the MRMC variance derived by the mechanistic approach of Barrett et al. (2005), as well as the nonparametric variance of a single-reader AUC derived in the literature on U statistics. We investigate the bias and variance properties of the one-shot estimate through a set of Monte Carlo simulations with simulated model observers and images. The different simulation configurations vary numbers of readers and cases, amounts of image noise and internal noise, as well as how the readers are constructed. We compare the one-shot estimate to a method that uses the jackknife resampling technique with an analysis of variance model at its foundation (Dorfman et al. 1992). The name one-shot highlights that resampling is not used. The one-shot and jackknife estimators behave similarly, with the one-shot being marginally more efficient when the number of cases is small. We have derived a one-shot estimate of the MRMC variance of AUC that is based on a probabilistic foundation with limited assumptions, is unbiased, and compares favorably to an established estimate.
Visscher, Peter M; Goddard, Michael E
2015-01-01
Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N(2), where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N. Copyright © 2015 by the Genetics Society of America.
Determination of the optimal level for combining area and yield estimates
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Hixson, M. M.; Jobusch, C. D.
1981-01-01
Several levels of obtaining both area and yield estimates of corn and soybeans in Iowa were considered: county, refined strata, refined/split strata, crop reporting district, and state. Using the CCEA model form and smoothed weather data, regression coefficients at each level were derived to compute yield and its variance. Variances were also computed with stratum level. The variance of the yield estimates was largest at the state and smallest at the county level for both crops. The refined strata had somewhat larger variances than those associated with the refined/split strata and CRD. For production estimates, the difference in standard deviations among levels was not large for corn, but for soybeans the standard deviation at the state level was more than 50% greater than for the other levels. The refined strata had the smallest standard deviations. The county level was not considered in evaluation of production estimates due to lack of county area variances.
NASA Astrophysics Data System (ADS)
Reynders, Edwin; Maes, Kristof; Lombaert, Geert; De Roeck, Guido
2016-01-01
Identified modal characteristics are often used as a basis for the calibration and validation of dynamic structural models, for structural control, for structural health monitoring, etc. It is therefore important to know their accuracy. In this article, a method for estimating the (co)variance of modal characteristics that are identified with the stochastic subspace identification method is validated for two civil engineering structures. The first structure is a damaged prestressed concrete bridge for which acceleration and dynamic strain data were measured in 36 different setups. The second structure is a mid-rise building for which acceleration data were measured in 10 different setups. There is a good quantitative agreement between the predicted levels of uncertainty and the observed variability of the eigenfrequencies and damping ratios between the different setups. The method can therefore be used with confidence for quantifying the uncertainty of the identified modal characteristics, also when some or all of them are estimated from a single batch of vibration data. Furthermore, the method is seen to yield valuable insight in the variability of the estimation accuracy from mode to mode and from setup to setup: the more informative a setup is regarding an estimated modal characteristic, the smaller is the estimated variance.
Horner, Fleur; Bilzon, James L; Rayson, Mark; Blacker, Sam; Richmond, Victoria; Carter, James; Wright, Anthony; Nevill, Alan
2013-01-01
This study developed a multivariate model to predict free-living energy expenditure (EE) in independent military cohorts. Two hundred and eighty-eight individuals (20.6 ± 3.9 years, 67.9 ± 12.0 kg, 1.71 ± 0.10 m) from 10 cohorts wore accelerometers during observation periods of 7 or 10 days. Accelerometer counts (PAC) were recorded at 1-minute epochs. Total energy expenditure (TEE) and physical activity energy expenditure (PAEE) were derived using the doubly labelled water technique. Data were reduced to n = 155 based on wear-time. Associations between PAC and EE were assessed using allometric modelling. Models were derived using multiple log-linear regression analysis and gender differences assessed using analysis of covariance. In all models PAC, height and body mass were related to TEE (P < 0.01). For models predicting TEE (r (2) = 0.65, SE = 462 kcal · d(-1) (13.0%)), PAC explained 4% of the variance. For models predicting PAEE (r (2) = 0.41, SE = 490 kcal · d(-1) (32.0%)), PAC accounted for 6% of the variance. Accelerometry increases the accuracy of EE estimation in military populations. However, the unique nature of military life means accurate prediction of individual free-living EE is highly dependent on anthropometric measurements.
Impact of source collinearity in simulated PM 2.5 data on the PMF receptor model solution
NASA Astrophysics Data System (ADS)
Habre, Rima; Coull, Brent; Koutrakis, Petros
2011-12-01
Positive Matrix Factorization (PMF) is a factor analytic model used to identify particle sources and to estimate their contributions to PM 2.5 concentrations observed at receptor sites. Collinearity in source contributions due to meteorological conditions introduces uncertainty in the PMF solution. We simulated datasets of speciated PM 2.5 concentrations associated with three ambient particle sources: "Motor Vehicle" (MV), "Sodium Chloride" (NaCl), and "Sulfur" (S), and we varied the correlation structure between their mass contributions to simulate collinearity. We analyzed the datasets in PMF using the ME-2 multilinear engine. The Pearson correlation coefficients between the simulated and PMF-predicted source contributions and profiles are denoted by " G correlation" and " F correlation", respectively. In sensitivity analyses, we examined how the means or variances of the source contributions affected the stability of the PMF solution with collinearity. The % errors in predicting the average source contributions were 23, 80 and 23% for MV, NaCl, and S, respectively. On average, the NaCl contribution was overestimated, while MV and S contributions were underestimated. The ability of PMF to predict the contributions and profiles of the three sources deteriorated significantly as collinearity in their contributions increased. When the mean of NaCl or variance of NaCl and MV source contributions was increased, the deterioration in G correlation with increasing collinearity became less significant, and the ability of PMF to predict the NaCl and MV loading profiles improved. When the three factor profiles were simulated to share more elements, the decrease in G and F correlations became non-significant. Our findings agree with previous simulation studies reporting that correlated sources are predicted with higher error and bias. Consequently, the power to detect significant concentration-response estimates in health effect analyses weakens.
Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo
2014-01-01
We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005–0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level. PMID:24498162
Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo
2014-01-01
We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005-0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level.
On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Mulder, Han A; Rönnegård, Lars; Fikse, W Freddy; Veerkamp, Roel F; Strandberg, Erling
2013-07-04
Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike's information criterion using h-likelihood to select the best fitting model. We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike's information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike's information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
NASA Astrophysics Data System (ADS)
Wheeler, David C.; Waller, Lance A.
2009-03-01
In this paper, we compare and contrast a Bayesian spatially varying coefficient process (SVCP) model with a geographically weighted regression (GWR) model for the estimation of the potentially spatially varying regression effects of alcohol outlets and illegal drug activity on violent crime in Houston, Texas. In addition, we focus on the inherent coefficient shrinkage properties of the Bayesian SVCP model as a way to address increased coefficient variance that follows from collinearity in GWR models. We outline the advantages of the Bayesian model in terms of reducing inflated coefficient variance, enhanced model flexibility, and more formal measuring of model uncertainty for prediction. We find spatially varying effects for alcohol outlets and drug violations, but the amount of variation depends on the type of model used. For the Bayesian model, this variation is controllable through the amount of prior influence placed on the variance of the coefficients. For example, the spatial pattern of coefficients is similar for the GWR and Bayesian models when a relatively large prior variance is used in the Bayesian model.
Population-based absolute risk estimation with survey data
Kovalchik, Stephanie A.; Pfeiffer, Ruth M.
2013-01-01
Absolute risk is the probability that a cause-specific event occurs in a given time interval in the presence of competing events. We present methods to estimate population-based absolute risk from a complex survey cohort that can accommodate multiple exposure-specific competing risks. The hazard function for each event type consists of an individualized relative risk multiplied by a baseline hazard function, which is modeled nonparametrically or parametrically with a piecewise exponential model. An influence method is used to derive a Taylor-linearized variance estimate for the absolute risk estimates. We introduce novel measures of the cause-specific influences that can guide modeling choices for the competing event components of the model. To illustrate our methodology, we build and validate cause-specific absolute risk models for cardiovascular and cancer deaths using data from the National Health and Nutrition Examination Survey. Our applications demonstrate the usefulness of survey-based risk prediction models for predicting health outcomes and quantifying the potential impact of disease prevention programs at the population level. PMID:23686614
Rose, Kevin C.; Winslow, Luke A.; Read, Jordan S.; Read, Emily K.; Solomon, Christopher T.; Adrian, Rita; Hanson, Paul C.
2014-01-01
Diel changes in dissolved oxygen are often used to estimate gross primary production (GPP) and ecosystem respiration (ER) in aquatic ecosystems. Despite the widespread use of this approach to understand ecosystem metabolism, we are only beginning to understand the degree and underlying causes of uncertainty for metabolism model parameter estimates. Here, we present a novel approach to improve the precision and accuracy of ecosystem metabolism estimates by identifying physical metrics that indicate when metabolism estimates are highly uncertain. Using datasets from seventeen instrumented GLEON (Global Lake Ecological Observatory Network) lakes, we discovered that many physical characteristics correlated with uncertainty, including PAR (photosynthetically active radiation, 400-700 nm), daily variance in Schmidt stability, and wind speed. Low PAR was a consistent predictor of high variance in GPP model parameters, but also corresponded with low ER model parameter variance. We identified a threshold (30% of clear sky PAR) below which GPP parameter variance increased rapidly and was significantly greater in nearly all lakes compared with variance on days with PAR levels above this threshold. The relationship between daily variance in Schmidt stability and GPP model parameter variance depended on trophic status, whereas daily variance in Schmidt stability was consistently positively related to ER model parameter variance. Wind speeds in the range of ~0.8-3 m s–1 were consistent predictors of high variance for both GPP and ER model parameters, with greater uncertainty in eutrophic lakes. Our findings can be used to reduce ecosystem metabolism model parameter uncertainty and identify potential sources of that uncertainty.
ERIC Educational Resources Information Center
Tanner-Smith, Emily E.; Tipton, Elizabeth
2014-01-01
Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and SPSS (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding…
Correcting for Systematic Bias in Sample Estimates of Population Variances: Why Do We Divide by n-1?
ERIC Educational Resources Information Center
Mittag, Kathleen Cage
An important topic presented in introductory statistics courses is the estimation of population parameters using samples. Students learn that when estimating population variances using sample data, we always get an underestimate of the population variance if we divide by n rather than n-1. One implication of this correction is that the degree of…
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181
Online Estimation of Allan Variance Coefficients Based on a Neural-Extended Kalman Filter
Miao, Zhiyong; Shen, Feng; Xu, Dingjie; He, Kunpeng; Tian, Chunmiao
2015-01-01
As a noise analysis method for inertial sensors, the traditional Allan variance method requires the storage of a large amount of data and manual analysis for an Allan variance graph. Although the existing online estimation methods avoid the storage of data and the painful procedure of drawing slope lines for estimation, they require complex transformations and even cause errors during the modeling of dynamic Allan variance. To solve these problems, first, a new state-space model that directly models the stochastic errors to obtain a nonlinear state-space model was established for inertial sensors. Then, a neural-extended Kalman filter algorithm was used to estimate the Allan variance coefficients. The real noises of an ADIS16405 IMU and fiber optic gyro-sensors were analyzed by the proposed method and traditional methods. The experimental results show that the proposed method is more suitable to estimate the Allan variance coefficients than the traditional methods. Moreover, the proposed method effectively avoids the storage of data and can be easily implemented using an online processor. PMID:25625903
Windhausen, Vanessa S; Atlin, Gary N; Hickey, John M; Crossa, Jose; Jannink, Jean-Luc; Sorrells, Mark E; Raman, Babu; Cairns, Jill E; Tarekegne, Amsal; Semagn, Kassa; Beyene, Yoseph; Grudloyma, Pichet; Technow, Frank; Riedelsheimer, Christian; Melchinger, Albrecht E
2012-11-01
Genomic prediction is expected to considerably increase genetic gains by increasing selection intensity and accelerating the breeding cycle. In this study, marker effects estimated in 255 diverse maize (Zea mays L.) hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F(2)-derived lines from each of five populations. Although up to 25% of the genetic variance could be explained by cross validation within the diversity panel, the prediction of testcross performance of F(2)-derived lines using marker effects estimated in the diversity panel was on average zero. Hybrids in the diversity panel could be grouped into eight breeding populations differing in mean performance. When performance was predicted separately for each breeding population on the basis of marker effects estimated in the other populations, predictive ability was low (i.e., 0.12 for grain yield). These results suggest that prediction resulted mostly from differences in mean performance of the breeding populations and less from the relationship between the training and validation sets or linkage disequilibrium with causal variants underlying the predicted traits. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomic prediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set.
Ramsey, Elijah W.; Nelson, G.
2005-01-01
To maximize the spectral distinctiveness (information) of the canopy reflectance, an atmospheric correction strategy was implemented to provide accurate estimates of the intrinsic reflectance from the Earth Observing 1 (EO1) satellite Hyperion sensor signal. In rendering the canopy reflectance, an estimate of optical depth derived from a measurement of downwelling irradiance was used to drive a radiative transfer simulation of atmospheric scattering and attenuation. During the atmospheric model simulation, the input whole-terrain background reflectance estimate was changed to minimize the differences between the model predicted and the observed canopy reflectance spectra at 34 sites. Lacking appropriate spectrally invariant scene targets, inclusion of the field and predicted comparison maximized the model accuracy and, thereby, the detail and precision in the canopy reflectance necessary to detect low percentage occurrences of invasive plants. After accounting for artifacts surrounding prominent absorption features from about 400nm to 1000nm, the atmospheric adjustment strategy correctly explained 99% of the observed canopy reflectance spectra variance. Separately, model simulation explained an average of 88%??9% of the observed variance in the visible and 98% ?? 1% in the near-infrared wavelengths. In the 34 model simulations, maximum differences between the observed and predicted reflectances were typically less than ?? 1% in the visible; however, maximum reflectance differences higher than ?? 1.6% (?2.3%) at more than a few wavelengths were observed at three sites. In the near-infrared wavelengths, maximum reflectance differences remained less than ??3% for 68% of the comparisons (??1 standard deviation) and less than ??6% for 95% of the comparisons (??2 standard deviation). Higher reflectance differences in the visible and near-infrared wavelengths were most likely associated with problems in the comparison, not in the model generation. ?? 2005 US Government.
Revised techniques for estimating peak discharges from channel width in Montana
Parrett, Charles; Hull, J.A.; Omang, R.J.
1987-01-01
This study was conducted to develop new estimating equations based on channel width and the updated flood frequency curves of previous investigations. Simple regression equations for estimating peak discharges with recurrence intervals of 2, 5, 10 , 25, 50, and 100 years were developed for seven regions in Montana. The standard errors of estimates for the equations that use active channel width as the independent variables ranged from 30% to 87%. The standard errors of estimate for the equations that use bankfull width as the independent variable ranged from 34% to 92%. The smallest standard errors generally occurred in the prediction equations for the 2-yr flood, 5-yr flood, and 10-yr flood, and the largest standard errors occurred in the prediction equations for the 100-yr flood. The equations that use active channel width and the equations that use bankfull width were determined to be about equally reliable in five regions. In the West Region, the equations that use bankfull width were slightly more reliable than those based on active channel width, whereas in the East-Central Region the equations that use active channel width were slightly more reliable than those based on bankfull width. Compared with similar equations previously developed, the standard errors of estimate for the new equations are substantially smaller in three regions and substantially larger in two regions. Limitations on the use of the estimating equations include: (1) The equations are based on stable conditions of channel geometry and prevailing water and sediment discharge; (2) The measurement of channel width requires a site visit, preferably by a person with experience in the method, and involves appreciable measurement errors; (3) Reliability of results from the equations for channel widths beyond the range of definition is unknown. In spite of the limitations, the estimating equations derived in this study are considered to be as reliable as estimating equations based on basin and climatic variables. Because the two types of estimating equations are independent, results from each can be weighted inversely proportional to their variances, and averaged. The weighted average estimate has a variance less than either individual estimate. (Author 's abstract)
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.
2013-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
Dazard, Jean-Eudes; Rao, J Sunil
2012-07-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data
Dazard, Jean-Eudes; Rao, J. Sunil
2012-01-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950
Lescroart, Mark D.; Stansbury, Dustin E.; Gallant, Jack L.
2015-01-01
Perception of natural visual scenes activates several functional areas in the human brain, including the Parahippocampal Place Area (PPA), Retrosplenial Complex (RSC), and the Occipital Place Area (OPA). It is currently unclear what specific scene-related features are represented in these areas. Previous studies have suggested that PPA, RSC, and/or OPA might represent at least three qualitatively different classes of features: (1) 2D features related to Fourier power; (2) 3D spatial features such as the distance to objects in a scene; or (3) abstract features such as the categories of objects in a scene. To determine which of these hypotheses best describes the visual representation in scene-selective areas, we applied voxel-wise modeling (VM) to BOLD fMRI responses elicited by a set of 1386 images of natural scenes. VM provides an efficient method for testing competing hypotheses by comparing predictions of brain activity based on encoding models that instantiate each hypothesis. Here we evaluated three different encoding models that instantiate each of the three hypotheses listed above. We used linear regression to fit each encoding model to the fMRI data recorded from each voxel, and we evaluated each fit model by estimating the amount of variance it predicted in a withheld portion of the data set. We found that voxel-wise models based on Fourier power or the subjective distance to objects in each scene predicted much of the variance predicted by a model based on object categories. Furthermore, the response variance explained by these three models is largely shared, and the individual models explain little unique variance in responses. Based on an evaluation of previous studies and the data we present here, we conclude that there is currently no good basis to favor any one of the three alternative hypotheses about visual representation in scene-selective areas. We offer suggestions for further studies that may help resolve this issue. PMID:26594164
Estimating means and variances: The comparative efficiency of composite and grab samples.
Brumelle, S; Nemetz, P; Casey, D
1984-03-01
This paper compares the efficiencies of two sampling techniques for estimating a population mean and variance. One procedure, called grab sampling, consists of collecting and analyzing one sample per period. The second procedure, called composite sampling, collectsn samples per period which are then pooled and analyzed as a single sample. We review the well known fact that composite sampling provides a superior estimate of the mean. However, it is somewhat surprising that composite sampling does not always generate a more efficient estimate of the variance. For populations with platykurtic distributions, grab sampling gives a more efficient estimate of the variance, whereas composite sampling is better for leptokurtic distributions. These conditions on kurtosis can be related to peakedness and skewness. For example, a necessary condition for composite sampling to provide a more efficient estimate of the variance is that the population density function evaluated at the mean (i.e.f(μ)) be greater than[Formula: see text]. If[Formula: see text], then a grab sample is more efficient. In spite of this result, however, composite sampling does provide a smaller estimate of standard error than does grab sampling in the context of estimating population means.
The Diesel Exhaust in Miners Study: V. Evaluation of the Exposure Assessment Methods
Stewart, Patricia A.; Vermeulen, Roel; Coble, Joseph B.; Blair, Aaron; Schleiff, Patricia; Lubin, Jay H.; Attfield, Mike; Silverman, Debra T.
2012-01-01
Exposure to respirable elemental carbon (REC), a component of diesel exhaust (DE), was assessed for an epidemiologic study investigating the association between DE and mortality, particularly from lung cancer, among miners at eight mining facilities from the date of dieselization (1947–1967) through 1997. To provide insight into the quality of the estimates for use in the epidemiologic analyses, several approaches were taken to evaluate the exposure assessment process and the quality of the estimates. An analysis of variance was conducted to evaluate the variability of 1998–2001 REC measurements within and between exposure groups of underground jobs. Estimates for the surface exposure groups were evaluated to determine if the arithmetic means (AMs) of the REC measurements increased with increased proximity to, or use of, diesel-powered equipment, which was the basis on which the surface groups were formed. Estimates of carbon monoxide (CO) (another component of DE) air concentrations in 1976–1977, derived from models developed to predict estimated historical exposures, were compared to 1976–1977 CO measurement data that had not been used in the model development. Alternative sets of estimates were developed to investigate the robustness of various model assumptions. These estimates were based on prediction models using: (i) REC medians rather AMs, (ii) a different CO:REC proportionality than a 1:1 relation, and (iii) 5-year averages of historical CO measurements rather than modeled historical CO measurements and DE-related determinants. The analysis of variance found that in three of the facilities, most of the between-group variability in the underground measurements was explained by the use of job titles. There was relatively little between-group variability in the other facilities. The estimated REC AMs for the surface exposure groups rose overall from 1 to 5 μg m−3 as proximity to, and use of, diesel equipment increased. The alternative estimates overall were highly correlated (∼0.9) with the primary set of estimates. The median of the relative differences between the 1976–1977 CO measurement means and the 1976–1977 estimates for six facilities was 29%. Comparison of estimated CO air concentrations from the facility-specific prediction models with historical CO measurement data found an overall agreement similar to that observed in other epidemiologic studies. Other evaluations of components of the exposure assessment process found moderate to excellent agreement. Thus, the overall evidence suggests that the estimates were likely accurate representations of historical personal exposure levels to DE and are useful for epidemiologic analyses. PMID:22383674
NASA Astrophysics Data System (ADS)
Zapata, D.; Salazar, M.; Chaves, B.; Keller, M.; Hoogenboom, G.
2015-12-01
Thermal time models have been used to predict the development of many different species, including grapevine ( Vitis vinifera L.). These models normally assume that there is a linear relationship between temperature and plant development. The goal of this study was to estimate the base temperature and duration in terms of thermal time for predicting veraison for four grapevine cultivars. Historical phenological data for four cultivars that were collected in the Pacific Northwest were used to develop the thermal time model. Base temperatures ( T b) of 0 and 10 °C and the best estimated T b using three different methods were evaluated for predicting veraison in grapevine. Thermal time requirements for each individual cultivar were evaluated through analysis of variance, and means were compared using the Fisher's test. The methods that were applied to estimate T b for the development of wine grapes included the least standard deviation in heat units, the regression coefficient, and the development rate method. The estimated T b varied among methods and cultivars. The development rate method provided the lowest T b values for all cultivars. For the three methods, Chardonnay had the lowest T b ranging from 8.7 to 10.7 °C, while the highest T b values were obtained for Riesling and Cabernet Sauvignon with 11.8 and 12.8 °C, respectively. Thermal time also differed among cultivars, when either the fixed or estimated T b was used. Predictions of the beginning of ripening with the estimated temperature resulted in the lowest variation in real days when compared with predictions using T b = 0 or 10 °C, regardless of the method that was used to estimate the T b.
The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement
2015-01-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630
How many days of accelerometer monitoring predict weekly physical activity behaviour in obese youth?
Vanhelst, Jérémy; Fardy, Paul S; Duhamel, Alain; Béghin, Laurent
2014-09-01
The aim of this study was to determine the type and the number of accelerometer monitoring days needed to predict weekly sedentary behaviour and physical activity in obese youth. Fifty-three obese youth wore a triaxial accelerometer for 7 days to measure physical activity in free-living conditions. Analyses of variance for repeated measures, Intraclass coefficient (ICC) and regression linear analyses were used. Obese youth spent significantly less time in physical activity on weekends or free days compared with school days. ICC analyses indicated a minimum of 2 days is needed to estimate physical activity behaviour. ICC were 0·80 between weekly physical activity and weekdays and 0·92 between physical activity and weekend days. The model has to include a weekday and a weekend day. Using any combination of one weekday and one weekend day, the percentage of variance explained is >90%. Results indicate that 2 days of monitoring are needed to estimate the weekly physical activity behaviour in obese youth with an accelerometer. Our results also showed the importance of taking into consideration school day versus free day and weekday versus weekend day in assessing physical activity in obese youth. © 2013 Scandinavian Society of Clinical Physiology and Nuclear Medicine. Published by John Wiley & Sons Ltd.
The developmental influence of primary memory capacity on working memory and academic achievement.
Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L
2015-08-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).
Acoustic correlate of vocal effort in spasmodic dysphonia.
Eadie, Tanya L; Stepp, Cara E
2013-03-01
This study characterized the relationship between relative fundamental frequency (RFF) and listeners' perceptions of vocal effort and overall spasmodic dysphonia severity in the voices of 19 individuals with adductor spasmodic dysphonia. Twenty inexperienced listeners evaluated the vocal effort and overall severity of voices using visual analog scales. The squared correlation coefficients (R2) between average vocal effort and overall severity and RFF measures were calculated as a function of the number of acoustic instances used for the RFF estimate (from 1 to 9, of a total of 9 voiced-voiceless-voiced instances). Increases in the number of acoustic instances used for the RFF average led to increases in the variance predicted by the RFF at the first cycle of voicing onset (onset RFF) in the perceptual measures; the use of 6 or more instances resulted in a stable estimate. The variance predicted by the onset RFF for vocal effort (R2 range, 0.06 to 0.43) was higher than that for overall severity (R2 range, 0.06 to 0.35). The offset RFF was not related to the perceptual measures, irrespective of the sample size. This study indicates that onset RFF measures are related to perceived vocal effort in patients with adductor spasmodic dysphonia. These results have implications for measuring outcomes in this population.
Khan, I.; Hawlader, Sophie Mohammad Delwer Hossain; Arifeen, Shams El; Moore, Sophie; Hills, Andrew P.; Wells, Jonathan C.; Persson, Lars-Åke; Kabir, Iqbal
2012-01-01
The aim of this study was to investigate the validity of the Tanita TBF 300A leg-to-leg bioimpedance analyzer for estimating fat-free mass (FFM) in Bangladeshi children aged 4-10 years and to develop novel prediction equations for use in this population, using deuterium dilution as the reference method. Two hundred Bangladeshi children were enrolled. The isotope dilution technique with deuterium oxide was used for estimation of total body water (TBW). FFM estimated by Tanita was compared with results of deuterium oxide dilution technique. Novel prediction equations were created for estimating FFM, using linear regression models, fitting child's height and impedance as predictors. There was a significant difference in FFM and percentage of body fat (BF%) between methods (p<0.01), Tanita underestimating TBW in boys (p=0.001) and underestimating BF% in girls (p<0.001). A basic linear regression model with height and impedance explained 83% of the variance in FFM estimated by deuterium oxide dilution technique. The best-fit equation to predict FFM from linear regression modelling was achieved by adding weight, sex, and age to the basic model, bringing the adjusted R2 to 89% (standard error=0.90, p<0.001). These data suggest Tanita analyzer may be a valid field-assessment technique in Bangladeshi children when using population-specific prediction equations, such as the ones developed here. PMID:23082630
Development of GP and GEP models to estimate an environmental issue induced by blasting operation.
Faradonbeh, Roohollah Shirani; Hasanipanah, Mahdi; Amnieh, Hassan Bakhshandeh; Armaghani, Danial Jahed; Monjezi, Masoud
2018-05-21
Air overpressure (AOp) is one of the most adverse effects induced by blasting in the surface mines and civil projects. So, proper evaluation and estimation of the AOp is important for minimizing the environmental problems resulting from blasting. The main aim of this study is to estimate AOp produced by blasting operation in Miduk copper mine, Iran, developing two artificial intelligence models, i.e., genetic programming (GP) and gene expression programming (GEP). Then, the accuracy of the GP and GEP models has been compared to multiple linear regression (MLR) and three empirical models. For this purpose, 92 blasting events were investigated, and subsequently, the AOp values were carefully measured. Moreover, in each operation, the values of maximum charge per delay and distance from blast points, as two effective parameters on the AOp, were measured. After predicting by the predictive models, their performance prediction was checked in terms of variance account for (VAF), coefficient of determination (CoD), and root mean square error (RMSE). Finally, it was found that the GEP with VAF of 94.12%, CoD of 0.941, and RMSE of 0.06 is a more precise model than other predictive models for the AOp prediction in the Miduk copper mine, and it can be introduced as a new powerful tool for estimating the AOp resulting from blasting.
Genetic parameters of legendre polynomials for first parity lactation curves.
Pool, M H; Janss, L L; Meuwissen, T H
2000-11-01
Variance components of the covariance function coefficients in a random regression test-day model were estimated by Legendre polynomials up to a fifth order for first-parity records of Dutch dairy cows using Gibbs sampling. Two Legendre polynomials of equal order were used to model the random part of the lactation curve, one for the genetic component and one for permanent environment. Test-day records from cows registered between 1990 to 1996 and collected by regular milk recording were available. For the data set, 23,700 complete lactations were selected from 475 herds sired by 262 sires. Because the application of a random regression model is limited by computing capacity, we investigated the minimum order needed to fit the variance structure in the data sufficiently. Predictions of genetic and permanent environmental variance structures were compared with bivariate estimates on 30-d intervals. A third-order or higher polynomial modeled the shape of variance curves over DIM with sufficient accuracy for the genetic and permanent environment part. Also, the genetic correlation structure was fitted with sufficient accuracy by a third-order polynomial, but, for the permanent environmental component, a fourth order was needed. Because equal orders are suggested in the literature, a fourth-order Legendre polynomial is recommended in this study. However, a rank of three for the genetic covariance matrix and of four for permanent environment allows a simpler covariance function with a reduced number of parameters based on the eigenvalues and eigenvectors.
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Wager, Tor D.; Atlas, Lauren Y.; Leotti, Lauren A.; Rilling, James K.
2012-01-01
Recent studies have identified brain correlates of placebo analgesia, but none have assessed how accurately patterns of brain activity can predict individual differences in placebo responses. We reanalyzed data from two fMRI studies of placebo analgesia (N = 47), using patterns of fMRI activity during the anticipation and experience of pain to predict new subjects’ scores on placebo analgesia and placebo-induced changes in pain processing. We used a cross-validated regression procedure, LASSO-PCR, which provided both unbiased estimates of predictive accuracy and interpretable maps of which regions are most important for prediction. Increased anticipatory activity in a frontoparietal network and decreases in a posterior insular/temporal network predicted placebo analgesia. Patterns of anticipatory activity across the cortex predicted a moderate amount of variance in the placebo response (~12% overall, ~40% for study 2 alone), which is substantial considering the multiple likely contributing factors. The most predictive regions were those associated with emotional appraisal, rather than cognitive control or pain processing. During pain, decreases in limbic and paralimbic regions most strongly predicted placebo analgesia. Responses within canonical pain-processing regions explained significant variance in placebo analgesia, but the pattern of effects was inconsistent with widespread decreases in nociceptive processing. Together, the findings suggest that engagement of emotional appraisal circuits drives individual variation in placebo analgesia, rather than early suppression of nociceptive processing. This approach provides a framework that will allow prediction accuracy to increase as new studies provide more precise information for future predictive models. PMID:21228154
Unraveling additive from nonadditive effects using genomic relationship matrices.
Muñoz, Patricio R; Resende, Marcio F R; Gezan, Salvador A; Resende, Marcos Deon Vilela; de Los Campos, Gustavo; Kirst, Matias; Huber, Dudley; Peter, Gary F
2014-12-01
The application of quantitative genetics in plant and animal breeding has largely focused on additive models, which may also capture dominance and epistatic effects. Partitioning genetic variance into its additive and nonadditive components using pedigree-based models (P-genomic best linear unbiased predictor) (P-BLUP) is difficult with most commonly available family structures. However, the availability of dense panels of molecular markers makes possible the use of additive- and dominance-realized genomic relationships for the estimation of variance components and the prediction of genetic values (G-BLUP). We evaluated height data from a multifamily population of the tree species Pinus taeda with a systematic series of models accounting for additive, dominance, and first-order epistatic interactions (additive by additive, dominance by dominance, and additive by dominance), using either pedigree- or marker-based information. We show that, compared with the pedigree, use of realized genomic relationships in marker-based models yields a substantially more precise separation of additive and nonadditive components of genetic variance. We conclude that the marker-based relationship matrices in a model including additive and nonadditive effects performed better, improving breeding value prediction. Moreover, our results suggest that, for tree height in this population, the additive and nonadditive components of genetic variance are similar in magnitude. This novel result improves our current understanding of the genetic control and architecture of a quantitative trait and should be considered when developing breeding strategies. Copyright © 2014 by the Genetics Society of America.
Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis
ERIC Educational Resources Information Center
Williams, Ryan
2013-01-01
The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…
Smooth empirical Bayes estimation of observation error variances in linear systems
NASA Technical Reports Server (NTRS)
Martz, H. F., Jr.; Lian, M. W.
1972-01-01
A smooth empirical Bayes estimator was developed for estimating the unknown random scale component of each of a set of observation error variances. It is shown that the estimator possesses a smaller average squared error loss than other estimators for a discrete time linear system.
Comparison of structural and least-squares lines for estimating geologic relations
Williams, G.P.; Troutman, B.M.
1990-01-01
Two different goals in fitting straight lines to data are to estimate a "true" linear relation (physical law) and to predict values of the dependent variable with the smallest possible error. Regarding the first goal, a Monte Carlo study indicated that the structural-analysis (SA) method of fitting straight lines to data is superior to the ordinary least-squares (OLS) method for estimating "true" straight-line relations. Number of data points, slope and intercept of the true relation, and variances of the errors associated with the independent (X) and dependent (Y) variables influence the degree of agreement. For example, differences between the two line-fitting methods decrease as error in X becomes small relative to error in Y. Regarding the second goal-predicting the dependent variable-OLS is better than SA. Again, the difference diminishes as X takes on less error relative to Y. With respect to estimation of slope and intercept and prediction of Y, agreement between Monte Carlo results and large-sample theory was very good for sample sizes of 100, and fair to good for sample sizes of 20. The procedures and error measures are illustrated with two geologic examples. ?? 1990 International Association for Mathematical Geology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Narlesky, Joshua Edward; Kelly, Elizabeth J.
2015-09-10
This report documents the new PG calibration regression equation. These calibration equations incorporate new data that have become available since revision 1 of “A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis” was issued [3] The calibration equations are based on a weighted least squares (WLS) approach for the regression. The WLS method gives each data point its proper amount of influence over the parameter estimates. This gives two big advantages, more precise parameter estimates and better and more defensible estimates of uncertainties. The WLS approach makes sense both statistically and experimentally because themore » variances increase with concentration, and there are physical reasons that the higher measurements are less reliable and should be less influential. The new magnesium calibration includes a correction for sodium and separate calibration equation for items with and without chlorine. These additional calibration equations allow for better predictions and smaller uncertainties for sodium in materials with and without chlorine. Chlorine and sodium have separate equations for RICH materials. Again, these equations give better predictions and smaller uncertainties chlorine and sodium for RICH materials.« less
NASA Technical Reports Server (NTRS)
Parrish, R. S.; Carter, M. C.
1974-01-01
This analysis utilizes computer simulation and statistical estimation. Realizations of stationary gaussian stochastic processes with selected autocorrelation functions are computer simulated. Analysis of the simulated data revealed that the mean and the variance of a process were functionally dependent upon the autocorrelation parameter and crossing level. Using predicted values for the mean and standard deviation, by the method of moments, the distribution parameters was estimated. Thus, given the autocorrelation parameter, crossing level, mean, and standard deviation of a process, the probability of exceeding the crossing level for a particular length of time was calculated.
Designing and operating infrastructure for nonstationary flood risk management
NASA Astrophysics Data System (ADS)
Doss-Gollin, J.; Farnham, D. J.; Lall, U.
2017-12-01
Climate exhibits organized low-frequency and regime-like variability at multiple time scales, causing the risk associated with climate extremes such as floods and droughts to vary in time. Despite broad recognition of this nonstationarity, there has been little theoretical development of ideas for the design and operation of infrastructure considering the regime structure of such changes and their potential predictability. We use paleo streamflow reconstructions to illustrate an approach to the design and operation of infrastructure to address nonstationary flood and drought risk. Specifically, we consider the tradeoff between flood control and conservation storage, and develop design and operation principles for allocating these storage volumes considering both a m-year project planning period and a n-year historical sampling record. As n increases, the potential uncertainty in probabilistic estimates of the return periods associated with the T-year extreme event decreases. As the duration m of the future operation period decreases, the uncertainty associated with the occurrence of the T-year event also increases. Finally, given the quasi-periodic nature of the system it may be possible to offer probabilistic predictions of the conditions in the m-year future period, especially if m is small. In the context of such predictions, one can consider that a m-year prediction may have lower bias, but higher variance, than would be associated with using a stationary estimate from the preceding n years. This bias-variance trade-off, and the potential for considering risk management for multiple values of m, provides an interesting system design challenge. We use wavelet-based simulation models in a Bayesian framework to estimate these biases and uncertainty distributions and devise a risk-optimized decision rule for the allocation of flood and conservation storage. The associated theoretical development also provides a methodology for the sizing of storage for new infrastructure under nonstationarity, and an examination of risk adaptation measures which consider both short term and long term options simultaneously.
Modification of inertial oscillations by the mesoscale eddy field
NASA Astrophysics Data System (ADS)
Elipot, Shane; Lumpkin, Rick; Prieto, GermáN.
2010-09-01
The modification of near-surface near-inertial oscillations (NIOs) by the geostrophic vorticity is studied globally from an observational standpoint. Surface drifter are used to estimate NIO characteristics. Despite its spatial resolution limits, altimetry is used to estimate the geostrophic vorticity. Three characteristics of NIOs are considered: the relative frequency shift with respect to the local inertial frequency; the near-inertial variance; and the inverse excess bandwidth, which is interpreted as a decay time scale. The geostrophic mesoscale flow shifts the frequency of NIOs by approximately half its vorticity. Equatorward of 30°N and S, this effect is added to a global pattern of blue shift of NIOs. While the global pattern of near-inertial variance is interpretable in terms of wind forcing, it is also observed that the geostrophic vorticity organizes the near-inertial variance; it is maximum for near zero values of the Laplacian of the vorticity and decreases for nonzero values, albeit not as much for positive as for negative values. Because the Laplacian of vorticity and vorticity are anticorrelated in the altimeter data set, overall, more near-inertial variance is found in anticyclonic vorticity regions than in cyclonic regions. While this is compatible with anticyclones trapping NIOs, the organization of near-inertial variance by the Laplacian of vorticity is also in very good agreement with previous theoretical and numerical predictions. The inverse bandwidth is a decreasing function of the gradient of vorticity, which acts like the gradient of planetary vorticity to increase the decay of NIOs from the ocean surface. Because the altimetry data set captures the largest vorticity gradients in energetic mesoscale regions, it is also observed that NIOs decay faster in large geostrophic eddy kinetic energy regions.
Patalay, Praveetha; Fitzsimons, Emla
2016-09-01
To investigate a framework of correlates of both mental illness and wellbeing in a large, current, and nationally representative sample of children in the United Kingdom. An ecologic framework of correlates including individual (sociodemographic and human capital), family, social, and wider environmental factors were examined in 12,347 children aged 11 years old from the UK Millennium Cohort Study. Mental illness and wellbeing scores were standardized to allow comparisons, and the variance explained by the different predictors was estimated. Mental illness and wellbeing were weakly correlated in children (r = 0.2), and their correlates were similar in some instances (e.g., family structure, sibling bullying, peer problems) but differed in others (e.g., family income, perceived socioeconomic status, cognitive ability, health status, neighborhood safety). The predictors included in the study explained 47% of the variance in symptoms of mental illness, with social relationships, home environment, parent health, cognitive ability, socioeconomic status, and health factors predicting large amounts of variance. A comparatively lower 26% of the variance in wellbeing was explained by the study variables, with wider environment, social relationships, perceived socioeconomic status, and home environment predicting the most variance. Correlates of children's mental illness and wellbeing are largely distinct, stressing the importance of considering these concepts separately and avoiding their conflation. This study highlights the relevance of these findings for understanding social gradients in mental health through the life course and the conceptualization and development of mental illness and wellbeing in childhood as precursors to lifelong development in these domains. Copyright © 2016 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.
Westbury, Chris F.; Shaoul, Cyrus; Hollis, Geoff; Smithson, Lisa; Briesemeister, Benny B.; Hofmann, Markus J.; Jacobs, Arthur M.
2013-01-01
Many studies have shown that behavioral measures are affected by manipulating the imageability of words. Though imageability is usually measured by human judgment, little is known about what factors underlie those judgments. We demonstrate that imageability judgments can be largely or entirely accounted for by two computable measures that have previously been associated with imageability, the size and density of a word's context and the emotional associations of the word. We outline an algorithmic method for predicting imageability judgments using co-occurrence distances in a large corpus. Our computed judgments account for 58% of the variance in a set of nearly two thousand imageability judgments, for words that span the entire range of imageability. The two factors account for 43% of the variance in lexical decision reaction times (LDRTs) that is attributable to imageability in a large database of 3697 LDRTs spanning the range of imageability. We document variances in the distribution of our measures across the range of imageability that suggest that they will account for more variance at the extremes, from which most imageability-manipulating stimulus sets are drawn. The two predictors account for 100% of the variance that is attributable to imageability in newly-collected LDRTs using a previously-published stimulus set of 100 items. We argue that our model of imageability is neurobiologically plausible by showing it is consistent with brain imaging data. The evidence we present suggests that behavioral effects in the lexical decision task that are usually attributed to the abstract/concrete distinction between words can be wholly explained by objective characteristics of the word that are not directly related to the semantic distinction. We provide computed imageability estimates for over 29,000 words. PMID:24421777
Influence function based variance estimation and missing data issues in case-cohort studies.
Mark, S D; Katki, H
2001-12-01
Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.
Comment on Hoffman and Rovine (2007): SPSS MIXED can estimate models with heterogeneous variances.
Weaver, Bruce; Black, Ryan A
2015-06-01
Hoffman and Rovine (Behavior Research Methods, 39:101-117, 2007) have provided a very nice overview of how multilevel models can be useful to experimental psychologists. They included two illustrative examples and provided both SAS and SPSS commands for estimating the models they reported. However, upon examining the SPSS syntax for the models reported in their Table 3, we found no syntax for models 2B and 3B, both of which have heterogeneous error variances. Instead, there is syntax that estimates similar models with homogeneous error variances and a comment stating that SPSS does not allow heterogeneous errors. But that is not correct. We provide SPSS MIXED commands to estimate models 2B and 3B with heterogeneous error variances and obtain results nearly identical to those reported by Hoffman and Rovine in their Table 3. Therefore, contrary to the comment in Hoffman and Rovine's syntax file, SPSS MIXED can estimate models with heterogeneous error variances.
Kalman filter for statistical monitoring of forest cover across sub-continental regions [Symposium
Raymond L. Czaplewski
1991-01-01
The Kalman filter is a generalization of the composite estimator. The univariate composite estimate combines 2 prior estimates of population parameter with a weighted average where the scalar weight is inversely proportional to the variances. The composite estimator is a minimum variance estimator that requires no distributional assumptions other than estimates of the...
A de-noising method using the improved wavelet threshold function based on noise variance estimation
NASA Astrophysics Data System (ADS)
Liu, Hui; Wang, Weida; Xiang, Changle; Han, Lijin; Nie, Haizhao
2018-01-01
The precise and efficient noise variance estimation is very important for the processing of all kinds of signals while using the wavelet transform to analyze signals and extract signal features. In view of the problem that the accuracy of traditional noise variance estimation is greatly affected by the fluctuation of noise values, this study puts forward the strategy of using the two-state Gaussian mixture model to classify the high-frequency wavelet coefficients in the minimum scale, which takes both the efficiency and accuracy into account. According to the noise variance estimation, a novel improved wavelet threshold function is proposed by combining the advantages of hard and soft threshold functions, and on the basis of the noise variance estimation algorithm and the improved wavelet threshold function, the research puts forth a novel wavelet threshold de-noising method. The method is tested and validated using random signals and bench test data of an electro-mechanical transmission system. The test results indicate that the wavelet threshold de-noising method based on the noise variance estimation shows preferable performance in processing the testing signals of the electro-mechanical transmission system: it can effectively eliminate the interference of transient signals including voltage, current, and oil pressure and maintain the dynamic characteristics of the signals favorably.
Husby, Arild; Schielzeth, Holger; Forstmeier, Wolfgang; Gustafsson, Lars; Qvarnström, Anna
2013-03-01
Theory predicts that sex chromsome linkage should reduce intersexual genetic correlations thereby allowing the evolution of sexual dimorphism. Empirical evidence for sex linkage has come largely from crosses and few studies have examined how sexual dimorphism and sex linkage are related within outbred populations. Here, we use data on an array of different traits measured on over 10,000 individuals from two pedigreed populations of birds (collared flycatcher and zebra finch) to estimate the amount of sex-linked genetic variance (h(2)z ). Of 17 traits examined, eight showed a nonzero h(2)Z estimate but only four were significantly different from zero (wing patch size and tarsus length in collared flycatchers, wing length and beak color in zebra finches). We further tested how sexual dimorphism and the mode of selection operating on the trait relate to the proportion of sex-linked genetic variance. Sexually selected traits did not show higher h(2)Z than morphological traits and there was only a weak positive relationship between h(2)Z and sexual dimorphism. However, given the relative scarcity of empirical studies, it is premature to make conclusions about the role of sex chromosome linkage in the evolution of sexual dimorphism. © 2012 The Author(s). Evolution© 2012 The Society for the Study of Evolution.
NASA Astrophysics Data System (ADS)
Codis, Sandrine; Bernardeau, Francis; Pichon, Christophe
2016-08-01
In order to quantify the error budget in the measured probability distribution functions of cell densities, the two-point statistics of cosmic densities in concentric spheres is investigated. Bias functions are introduced as the ratio of their two-point correlation function to the two-point correlation of the underlying dark matter distribution. They describe how cell densities are spatially correlated. They are computed here via the so-called large deviation principle in the quasi-linear regime. Their large-separation limit is presented and successfully compared to simulations for density and density slopes: this regime is shown to be rapidly reached allowing to get sub-percent precision for a wide range of densities and variances. The corresponding asymptotic limit provides an estimate of the cosmic variance of standard concentric cell statistics applied to finite surveys. More generally, no assumption on the separation is required for some specific moments of the two-point statistics, for instance when predicting the generating function of cumulants containing any powers of concentric densities in one location and one power of density at some arbitrary distance from the rest. This exact `one external leg' cumulant generating function is used in particular to probe the rate of convergence of the large-separation approximation.
Poissant, Jocelyn; Wilson, Alastair J; Coltman, David W
2010-01-01
The independent evolution of the sexes may often be constrained if male and female homologous traits share a similar genetic architecture. Thus, cross-sex genetic covariance is assumed to play a key role in the evolution of sexual dimorphism (SD) with consequent impacts on sexual selection, population dynamics, and speciation processes. We compiled cross-sex genetic correlations (r(MF)) estimates from 114 sources to assess the extent to which the evolution of SD is typically constrained and test several specific hypotheses. First, we tested if r(MF) differed among trait types and especially between fitness components and other traits. We also tested the theoretical prediction of a negative relationship between r(MF) and SD based on the expectation that increases in SD should be facilitated by sex-specific genetic variance. We show that r(MF) is usually large and positive but that it is typically smaller for fitness components. This demonstrates that the evolution of SD is typically genetically constrained and that sex-specific selection coefficients may often be opposite in sign due to sub-optimal levels of SD. Most importantly, we confirm that sex-specific genetic variance is an important contributor to the evolution of SD by validating the prediction of a negative correlation between r(MF) and SD.
Baird, Rachel; Maxwell, Scott E
2016-06-01
Time-varying predictors in multilevel models are a useful tool for longitudinal research, whether they are the research variable of interest or they are controlling for variance to allow greater power for other variables. However, standard recommendations to fix the effect of time-varying predictors may make an assumption that is unlikely to hold in reality and may influence results. A simulation study illustrates that treating the time-varying predictor as fixed may allow analyses to converge, but the analyses have poor coverage of the true fixed effect when the time-varying predictor has a random effect in reality. A second simulation study shows that treating the time-varying predictor as random may have poor convergence, except when allowing negative variance estimates. Although negative variance estimates are uninterpretable, results of the simulation show that estimates of the fixed effect of the time-varying predictor are as accurate for these cases as for cases with positive variance estimates, and that treating the time-varying predictor as random and allowing negative variance estimates performs well whether the time-varying predictor is fixed or random in reality. Because of the difficulty of interpreting negative variance estimates, 2 procedures are suggested for selection between fixed-effect and random-effect models: comparing between fixed-effect and constrained random-effect models with a likelihood ratio test or fitting a fixed-effect model when an unconstrained random-effect model produces negative variance estimates. The performance of these 2 procedures is compared. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Sampling design optimisation for rainfall prediction using a non-stationary geostatistical model
NASA Astrophysics Data System (ADS)
Wadoux, Alexandre M. J.-C.; Brus, Dick J.; Rico-Ramirez, Miguel A.; Heuvelink, Gerard B. M.
2017-09-01
The accuracy of spatial predictions of rainfall by merging rain-gauge and radar data is partly determined by the sampling design of the rain-gauge network. Optimising the locations of the rain-gauges may increase the accuracy of the predictions. Existing spatial sampling design optimisation methods are based on minimisation of the spatially averaged prediction error variance under the assumption of intrinsic stationarity. Over the past years, substantial progress has been made to deal with non-stationary spatial processes in kriging. Various well-documented geostatistical models relax the assumption of stationarity in the mean, while recent studies show the importance of considering non-stationarity in the variance for environmental processes occurring in complex landscapes. We optimised the sampling locations of rain-gauges using an extension of the Kriging with External Drift (KED) model for prediction of rainfall fields. The model incorporates both non-stationarity in the mean and in the variance, which are modelled as functions of external covariates such as radar imagery, distance to radar station and radar beam blockage. Spatial predictions are made repeatedly over time, each time recalibrating the model. The space-time averaged KED variance was minimised by Spatial Simulated Annealing (SSA). The methodology was tested using a case study predicting daily rainfall in the north of England for a one-year period. Results show that (i) the proposed non-stationary variance model outperforms the stationary variance model, and (ii) a small but significant decrease of the rainfall prediction error variance is obtained with the optimised rain-gauge network. In particular, it pays off to place rain-gauges at locations where the radar imagery is inaccurate, while keeping the distribution over the study area sufficiently uniform.
Predictive control of hollow-fiber bioreactors for the production of monoclonal antibodies.
Dowd, J E; Weber, I; Rodriguez, B; Piret, J M; Kwok, K E
1999-05-20
The selection of medium feed rates for perfusion bioreactors represents a challenge for process optimization, particularly in bioreactors that are sampled infrequently. When the present and immediate future of a bioprocess can be adequately described, predictive control can minimize deviations from set points in a manner that can maximize process consistency. Predictive control of perfusion hollow-fiber bioreactors was investigated in a series of hybridoma cell cultures that compared operator control to computer estimation of feed rates. Adaptive software routines were developed to estimate the current and predict the future glucose uptake and lactate production of the bioprocess at each sampling interval. The current and future glucose uptake rates were used to select the perfusion feed rate in a designed response to deviations from the set point values. The routines presented a graphical user interface through which the operator was able to view the up-to-date culture performance and assess the model description of the immediate future culture performance. In addition, fewer samples were taken in the computer-estimated cultures, reducing labor and analytical expense. The use of these predictive controller routines and the graphical user interface decreased the glucose and lactate concentration variances up to sevenfold, and antibody yields increased by 10% to 43%. Copyright 1999 John Wiley & Sons, Inc.
Bootstrap Estimation and Testing for Variance Equality.
ERIC Educational Resources Information Center
Olejnik, Stephen; Algina, James
The purpose of this study was to develop a single procedure for comparing population variances which could be used for distribution forms. Bootstrap methodology was used to estimate the variability of the sample variance statistic when the population distribution was normal, platykurtic and leptokurtic. The data for the study were generated and…
Nonexercise Equations to Estimate Fitness in White European and South Asian Men.
O'Donovan, Gary; Bakrania, Kishan; Ghouri, Nazim; Yates, Thomas; Gray, Laura J; Hamer, Mark; Stamatakis, Emmanuel; Khunti, Kamlesh; Davies, Melanie; Sattar, Naveed; Gill, Jason M R
2016-05-01
Cardiorespiratory fitness is a strong, independent predictor of health, whether it is measured in an exercise test or estimated in an equation. The purpose of this study was to develop and validate equations to estimate fitness in middle-age white European and South Asian men. Multiple linear regression models (n = 168, including 83 white European and 85 South Asian men) were created using variables that are thought to be important in predicting fitness (V˙O2max, mL·kg⁻¹·min⁻¹): age (yr), body mass index (kg·m⁻²), resting HR (bpm); smoking status (0, never smoked; 1, ex or current smoker), physical activity expressed as quintiles (0, quintile 1; 1, quintile 2; 2, quintile 3; 3, quintile 4; 4, quintile 5), categories of moderate- to-vigorous intensity physical activity (MVPA) (0, <75 min·wk⁻¹; 1, 75-150 min·wk⁻¹; 2, >150-225 min·wk⁻¹; 3, >225-300 min·wk⁻¹; 4, >300 min·wk⁻¹), or minutes of MVPA (min·wk⁻¹); and, ethnicity (0, South Asian; 1, white). The leave-one-out cross-validation procedure was used to assess the generalizability, and the bootstrap and jackknife resampling techniques were used to estimate the variance and bias of the models. Around 70% of the variance in fitness was explained in models with an ethnicity variable, such as: V˙O2max = 77.409 - (age × 0.374) - (body mass index × 0.906) - (ex or current smoker × 1.976) + (physical activity quintile coefficient) - (resting HR × 0.066) + (white ethnicity × 8.032), where physical activity quintile 1 is 0, 2 is 1.127, 3 is 1.869, 4 is 3.793, and 5 is 3.029. Only around 50% of the variance was explained in models without an ethnicity variable. All models with an ethnicity variable were generalizable and had low variance and bias. These data demonstrate the importance of incorporating ethnicity in nonexercise equations to estimate cardiorespiratory fitness in multiethnic populations.
2013-01-01
Background Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring. PMID:23827014
Empirical Bayes estimation of undercount in the decennial census.
Cressie, N
1989-12-01
Empirical Bayes methods are used to estimate the extent of the undercount at the local level in the 1980 U.S. census. "Grouping of like subareas from areas such as states, counties, and so on into strata is a useful way of reducing the variance of undercount estimators. By modeling the subareas within a stratum to have a common mean and variances inversely proportional to their census counts, and by taking into account sampling of the areas (e.g., by dual-system estimation), empirical Bayes estimators that compromise between the (weighted) stratum average and the sample value can be constructed. The amount of compromise is shown to depend on the relative importance of stratum variance to sampling variance. These estimators are evaluated at the state level (51 states, including Washington, D.C.) and stratified on race/ethnicity (3 strata) using data from the 1980 postenumeration survey (PEP 3-8, for the noninstitutional population)." excerpt
Eaglen, Sophie A E; Coffey, Mike P; Woolliams, John A; Wall, Eileen
2012-07-28
The focus in dairy cattle breeding is gradually shifting from production to functional traits and genetic parameters of calving traits are estimated more frequently. However, across countries, various statistical models are used to estimate these parameters. This study evaluates different models for calving ease and stillbirth in United Kingdom Holstein-Friesian cattle. Data from first and later parity records were used. Genetic parameters for calving ease, stillbirth and gestation length were estimated using the restricted maximum likelihood method, considering different models i.e. sire (-maternal grandsire), animal, univariate and bivariate models. Gestation length was fitted as a correlated indicator trait and, for all three traits, genetic correlations between first and later parities were estimated. Potential bias in estimates was avoided by acknowledging a possible environmental direct-maternal covariance. The total heritable variance was estimated for each trait to discuss its theoretical importance and practical value. Prediction error variances and accuracies were calculated to compare the models. On average, direct and maternal heritabilities for calving traits were low, except for direct gestation length. Calving ease in first parity had a significant and negative direct-maternal genetic correlation. Gestation length was maternally correlated to stillbirth in first parity and directly correlated to calving ease in later parities. Multi-trait models had a slightly greater predictive ability than univariate models, especially for the lowly heritable traits. The computation time needed for sire (-maternal grandsire) models was much smaller than for animal models with only small differences in accuracy. The sire (-maternal grandsire) model was robust when additional genetic components were estimated, while the equivalent animal model had difficulties reaching convergence. For the evaluation of calving traits, multi-trait models show a slight advantage over univariate models. Extended sire models (-maternal grandsire) are more practical and robust than animal models. Estimated genetic parameters for calving traits of UK Holstein cattle are consistent with literature. Calculating an aggregate estimated breeding value including direct and maternal values should encourage breeders to consider both direct and maternal effects in selection decisions.
2012-01-01
Background The focus in dairy cattle breeding is gradually shifting from production to functional traits and genetic parameters of calving traits are estimated more frequently. However, across countries, various statistical models are used to estimate these parameters. This study evaluates different models for calving ease and stillbirth in United Kingdom Holstein-Friesian cattle. Methods Data from first and later parity records were used. Genetic parameters for calving ease, stillbirth and gestation length were estimated using the restricted maximum likelihood method, considering different models i.e. sire (−maternal grandsire), animal, univariate and bivariate models. Gestation length was fitted as a correlated indicator trait and, for all three traits, genetic correlations between first and later parities were estimated. Potential bias in estimates was avoided by acknowledging a possible environmental direct-maternal covariance. The total heritable variance was estimated for each trait to discuss its theoretical importance and practical value. Prediction error variances and accuracies were calculated to compare the models. Results and discussion On average, direct and maternal heritabilities for calving traits were low, except for direct gestation length. Calving ease in first parity had a significant and negative direct-maternal genetic correlation. Gestation length was maternally correlated to stillbirth in first parity and directly correlated to calving ease in later parities. Multi-trait models had a slightly greater predictive ability than univariate models, especially for the lowly heritable traits. The computation time needed for sire (−maternal grandsire) models was much smaller than for animal models with only small differences in accuracy. The sire (−maternal grandsire) model was robust when additional genetic components were estimated, while the equivalent animal model had difficulties reaching convergence. Conclusions For the evaluation of calving traits, multi-trait models show a slight advantage over univariate models. Extended sire models (−maternal grandsire) are more practical and robust than animal models. Estimated genetic parameters for calving traits of UK Holstein cattle are consistent with literature. Calculating an aggregate estimated breeding value including direct and maternal values should encourage breeders to consider both direct and maternal effects in selection decisions. PMID:22839757
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, A.; Künsch, H. R.; Schwierz, C.; Stahel, W. A.
2012-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outlying observations may results from errors (e.g. in data transcription) or from local perturbations in the processes that are responsible for a given pattern of spatial variation. As an example, the spatial distribution of some trace metal in the soils of a region may be distorted by emissions of local anthropogenic sources. Outliers affect the modelling of the large-scale spatial variation, the so-called external drift or trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) [2] proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) [1] for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation. Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and unsampled locations and kriging variances. The method has been implemented in an R package. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis of the Tarrawarra soil moisture data set [3].
Gray, Brian R.; Gitzen, Robert A.; Millspaugh, Joshua J.; Cooper, Andrew B.; Licht, Daniel S.
2012-01-01
Variance components may play multiple roles (cf. Cox and Solomon 2003). First, magnitudes and relative magnitudes of the variances of random factors may have important scientific and management value in their own right. For example, variation in levels of invasive vegetation among and within lakes may suggest causal agents that operate at both spatial scales – a finding that may be important for scientific and management reasons. Second, variance components may also be of interest when they affect precision of means and covariate coefficients. For example, variation in the effect of water depth on the probability of aquatic plant presence in a study of multiple lakes may vary by lake. This variation will affect the precision of the average depth-presence association. Third, variance component estimates may be used when designing studies, including monitoring programs. For example, to estimate the numbers of years and of samples per year required to meet long-term monitoring goals, investigators need estimates of within and among-year variances. Other chapters in this volume (Chapters 7, 8, and 10) as well as extensive external literature outline a framework for applying estimates of variance components to the design of monitoring efforts. For example, a series of papers with an ecological monitoring theme examined the relative importance of multiple sources of variation, including variation in means among sites, years, and site-years, for the purposes of temporal trend detection and estimation (Larsen et al. 2004, and references therein).
Estimation of population size using open capture-recapture models
McDonald, T.L.; Amstrup, Steven C.
2001-01-01
One of the most important needs for wildlife managers is an accurate estimate of population size. Yet, for many species, including most marine species and large mammals, accurate and precise estimation of numbers is one of the most difficult of all research challenges. Open-population capture-recapture models have proven useful in many situations to estimate survival probabilities but typically have not been used to estimate population size. We show that open-population models can be used to estimate population size by developing a Horvitz-Thompson-type estimate of population size and an estimator of its variance. Our population size estimate keys on the probability of capture at each trap occasion and therefore is quite general and can be made a function of external covariates measured during the study. Here we define the estimator and investigate its bias, variance, and variance estimator via computer simulation. Computer simulations make extensive use of real data taken from a study of polar bears (Ursus maritimus) in the Beaufort Sea. The population size estimator is shown to be useful because it was negligibly biased in all situations studied. The variance estimator is shown to be useful in all situations, but caution is warranted in cases of extreme capture heterogeneity.
Hevesi, Joseph A.; Flint, Alan L.; Istok, Jonathan D.
1992-01-01
Values of average annual precipitation (AAP) may be important for hydrologic characterization of a potential high-level nuclear-waste repository site at Yucca Mountain, Nevada. Reliable measurements of AAP are sparse in the vicinity of Yucca Mountain, and estimates of AAP were needed for an isohyetal mapping over a 2600-square-mile watershed containing Yucca Mountain. Estimates were obtained with a multivariate geostatistical model developed using AAP and elevation data from a network of 42 precipitation stations in southern Nevada and southeastern California. An additional 1531 elevations were obtained to improve estimation accuracy. Isohyets representing estimates obtained using univariate geostatistics (kriging) defined a smooth and continuous surface. Isohyets representing estimates obtained using multivariate geostatistics (cokriging) defined an irregular surface that more accurately represented expected local orographic influences on AAP. Cokriging results included a maximum estimate within the study area of 335 mm at an elevation of 7400 ft, an average estimate of 157 mm for the study area, and an average estimate of 172 mm at eight locations in the vicinity of the potential repository site. Kriging estimates tended to be lower in comparison because the increased AAP expected for remote mountainous topography was not adequately represented by the available sample. Regression results between cokriging estimates and elevation were similar to regression results between measured AAP and elevation. The position of the cokriging 250-mm isohyet relative to the boundaries of pinyon pine and juniper woodlands provided indirect evidence of improved estimation accuracy because the cokriging result agreed well with investigations by others concerning the relationship between elevation, vegetation, and climate in the Great Basin. Calculated estimation variances were also mapped and compared to evaluate improvements in estimation accuracy. Cokriging estimation variances were reduced by an average of 54% relative to kriging variances within the study area. Cokriging reduced estimation variances at the potential repository site by 55% relative to kriging. The usefulness of an existing network of stations for measuring AAP within the study area was evaluated using cokriging variances, and twenty additional stations were located for the purpose of improving the accuracy of future isohyetal mappings. Using the expanded network of stations, the maximum cokriging estimation variance within the study area was reduced by 78% relative to the existing network, and the average estimation variance was reduced by 52%.
Inference of reactive transport model parameters using a Bayesian multivariate approach
NASA Astrophysics Data System (ADS)
Carniato, Luca; Schoups, Gerrit; van de Giesen, Nick
2014-08-01
Parameter estimation of subsurface transport models from multispecies data requires the definition of an objective function that includes different types of measurements. Common approaches are weighted least squares (WLS), where weights are specified a priori for each measurement, and weighted least squares with weight estimation (WLS(we)) where weights are estimated from the data together with the parameters. In this study, we formulate the parameter estimation task as a multivariate Bayesian inference problem. The WLS and WLS(we) methods are special cases in this framework, corresponding to specific prior assumptions about the residual covariance matrix. The Bayesian perspective allows for generalizations to cases where residual correlation is important and for efficient inference by analytically integrating out the variances (weights) and selected covariances from the joint posterior. Specifically, the WLS and WLS(we) methods are compared to a multivariate (MV) approach that accounts for specific residual correlations without the need for explicit estimation of the error parameters. When applied to inference of reactive transport model parameters from column-scale data on dissolved species concentrations, the following results were obtained: (1) accounting for residual correlation between species provides more accurate parameter estimation for high residual correlation levels whereas its influence for predictive uncertainty is negligible, (2) integrating out the (co)variances leads to an efficient estimation of the full joint posterior with a reduced computational effort compared to the WLS(we) method, and (3) in the presence of model structural errors, none of the methods is able to identify the correct parameter values.
Perceptual attraction in tool use: evidence for a reliability-based weighting mechanism.
Debats, Nienke B; Ernst, Marc O; Heuer, Herbert
2017-04-01
Humans are well able to operate tools whereby their hand movement is linked, via a kinematic transformation, to a spatially distant object moving in a separate plane of motion. An everyday example is controlling a cursor on a computer monitor. Despite these separate reference frames, the perceived positions of the hand and the object were found to be biased toward each other. We propose that this perceptual attraction is based on the principles by which the brain integrates redundant sensory information of single objects or events, known as optimal multisensory integration. That is, 1 ) sensory information about the hand and the tool are weighted according to their relative reliability (i.e., inverse variances), and 2 ) the unisensory reliabilities sum up in the integrated estimate. We assessed whether perceptual attraction is consistent with optimal multisensory integration model predictions. We used a cursor-control tool-use task in which we manipulated the relative reliability of the unisensory hand and cursor position estimates. The perceptual biases shifted according to these relative reliabilities, with an additional bias due to contextual factors that were present in experiment 1 but not in experiment 2 The biased position judgments' variances were, however, systematically larger than the predicted optimal variances. Our findings suggest that the perceptual attraction in tool use results from a reliability-based weighting mechanism similar to optimal multisensory integration, but that certain boundary conditions for optimality might not be satisfied. NEW & NOTEWORTHY Kinematic tool use is associated with a perceptual attraction between the spatially separated hand and the effective part of the tool. We provide a formal account for this phenomenon, thereby showing that the process behind it is similar to optimal integration of sensory information relating to single objects. Copyright © 2017 the American Physiological Society.
2015-01-01
The recent availability of high frequency data has permitted more efficient ways of computing volatility. However, estimation of volatility from asset price observations is challenging because observed high frequency data are generally affected by noise-microstructure effects. We address this issue by using the Fourier estimator of instantaneous volatility introduced in Malliavin and Mancino 2002. We prove a central limit theorem for this estimator with optimal rate and asymptotic variance. An extensive simulation study shows the accuracy of the spot volatility estimates obtained using the Fourier estimator and its robustness even in the presence of different microstructure noise specifications. An empirical analysis on high frequency data (U.S. S&P500 and FIB 30 indices) illustrates how the Fourier spot volatility estimates can be successfully used to study intraday variations of volatility and to predict intraday Value at Risk. PMID:26421617
Kim, Minjung; Lamont, Andrea E.; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M. Lee
2015-01-01
Regression mixture models are a novel approach for modeling heterogeneous effects of predictors on an outcome. In the model building process residual variances are often disregarded and simplifying assumptions made without thorough examination of the consequences. This simulation study investigated the impact of an equality constraint on the residual variances across latent classes. We examine the consequence of constraining the residual variances on class enumeration (finding the true number of latent classes) and parameter estimates under a number of different simulation conditions meant to reflect the type of heterogeneity likely to exist in applied analyses. Results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted estimated class sizes and showed the potential to greatly impact parameter estimates in each class. Results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions were made. PMID:26139512
Effect of pregnancy on the genetic evaluation of dairy cattle.
Pereira, R J; Santana, M L; Bignardi, A B; Verneque, R S; El Faro, L; Albuquerque, L G
2011-09-26
We investigated the effect of stage of pregnancy on estimates of breeding values for milk yield and milk persistency in Gyr and Holstein dairy cattle in Brazil. Test-day milk yield records were analyzed using random regression models with or without the effect of pregnancy. Models were compared using residual variances, heritabilities, rank correlations of estimated breeding values of bulls and cows, and number of nonpregnant cows in the top 200 for milk yield and milk persistency. The estimates of residual variance and heritabilities obtained with the models with or without the effect of pregnancy were similar for the two breeds. Inclusion of the effect of pregnancy in genetic evaluation models for these populations did not affect the ranking of cows and sires based on their predicted breeding values for 305-day cumulative milk yield. In contrast, when we examined persistency of milk yield, lack of adjustment for the effect of pregnancy overestimated breeding values of nonpregnant cows and cows with a long days open period and underestimated breeding values of cows with a short days open period. We recommend that models include the effect of days of pregnancy for estimation of adjustment factors for the effect of pregnancy in genetic evaluations of Dairy Gyr and Holstein cattle.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harkness, A. L.
1977-09-01
Nine elements from each batch of fuel elements manufactured for the EBR-II reactor have been analyzed for /sup 235/U content by NDA methods. These values, together with those of the manufacturer, are used to estimate the product variance and the variances of the two measuring methods. These variances are compared with the variances computed from the stipulations of the contract. A method is derived for resolving the several variances into their within-batch and between-batch components. Some of these variance components have also been estimated by independent and more familiar conventional methods for comparison.
NASA Astrophysics Data System (ADS)
Lee, K. C.
2013-02-01
Multifractional Brownian motions have become popular as flexible models in describing real-life signals of high-frequency features in geoscience, microeconomics, and turbulence, to name a few. The time-changing Hurst exponent, which describes regularity levels depending on time measurements, and variance, which relates to an energy level, are two parameters that characterize multifractional Brownian motions. This research suggests a combined method of estimating the time-changing Hurst exponent and variance using the local variation of sampled paths of signals. The method consists of two phases: initially estimating global variance and then accurately estimating the time-changing Hurst exponent. A simulation study shows its performance in estimation of the parameters. The proposed method is applied to characterization of atmospheric stability in which descriptive statistics from the estimated time-changing Hurst exponent and variance classify stable atmosphere flows from unstable ones.
Development of a technique for estimating noise covariances using multiple observers
NASA Technical Reports Server (NTRS)
Bundick, W. Thomas
1988-01-01
Friedland's technique for estimating the unknown noise variances of a linear system using multiple observers has been extended by developing a general solution for the estimates of the variances, developing the statistics (mean and standard deviation) of these estimates, and demonstrating the solution on two examples.
Consistent Small-Sample Variances for Six Gamma-Family Measures of Ordinal Association
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Gamma-family measures are bivariate ordinal correlation measures that form a family because they all reduce to Goodman and Kruskal's gamma in the absence of ties (1954). For several gamma-family indices, more than one variance estimator has been introduced. In previous research, the "consistent" variance estimator described by Cliff and…
ERIC Educational Resources Information Center
Lucas, Richard E.; Donnellan, M. Brent
2012-01-01
Life satisfaction is often assessed using single-item measures. However, estimating the reliability of these measures can be difficult because internal consistency coefficients cannot be calculated. Existing approaches use longitudinal data to isolate occasion-specific variance from variance that is either completely stable or variance that…
Estimation of Variance in the Case of Complex Samples.
ERIC Educational Resources Information Center
Groenewald, A. C.; Stoker, D. J.
In a complex sampling scheme it is desirable to select the primary sampling units (PSUs) without replacement to prevent duplications in the sample. Since the estimation of the sampling variances is more complicated when the PSUs are selected without replacement, L. Kish (1965) recommends that the variance be calculated using the formulas…
Windhausen, Vanessa S.; Atlin, Gary N.; Hickey, John M.; Crossa, Jose; Jannink, Jean-Luc; Sorrells, Mark E.; Raman, Babu; Cairns, Jill E.; Tarekegne, Amsal; Semagn, Kassa; Beyene, Yoseph; Grudloyma, Pichet; Technow, Frank; Riedelsheimer, Christian; Melchinger, Albrecht E.
2012-01-01
Genomic prediction is expected to considerably increase genetic gains by increasing selection intensity and accelerating the breeding cycle. In this study, marker effects estimated in 255 diverse maize (Zea mays L.) hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F2-derived lines from each of five populations. Although up to 25% of the genetic variance could be explained by cross validation within the diversity panel, the prediction of testcross performance of F2-derived lines using marker effects estimated in the diversity panel was on average zero. Hybrids in the diversity panel could be grouped into eight breeding populations differing in mean performance. When performance was predicted separately for each breeding population on the basis of marker effects estimated in the other populations, predictive ability was low (i.e., 0.12 for grain yield). These results suggest that prediction resulted mostly from differences in mean performance of the breeding populations and less from the relationship between the training and validation sets or linkage disequilibrium with causal variants underlying the predicted traits. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomic prediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set. PMID:23173094
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Nonparametric estimation of plant density by the distance method
Patil, S.A.; Burnham, K.P.; Kovner, J.L.
1979-01-01
A relation between the plant density and the probability density function of the nearest neighbor distance (squared) from a random point is established under fairly broad conditions. Based upon this relationship, a nonparametric estimator for the plant density is developed and presented in terms of order statistics. Consistency and asymptotic normality of the estimator are discussed. An interval estimator for the density is obtained. The modifications of this estimator and its variance are given when the distribution is truncated. Simulation results are presented for regular, random and aggregated populations to illustrate the nonparametric estimator and its variance. A numerical example from field data is given. Merits and deficiencies of the estimator are discussed with regard to its robustness and variance.
Strategies for Selecting Crosses Using Genomic Prediction in Two Wheat Breeding Programs.
Lado, Bettina; Battenfield, Sarah; Guzmán, Carlos; Quincke, Martín; Singh, Ravi P; Dreisigacker, Susanne; Peña, R Javier; Fritz, Allan; Silva, Paula; Poland, Jesse; Gutiérrez, Lucía
2017-07-01
The single most important decision in plant breeding programs is the selection of appropriate crosses. The ideal cross would provide superior predicted progeny performance and enough diversity to maintain genetic gain. The aim of this study was to compare the best crosses predicted using combinations of mid-parent value and variance prediction accounting for linkage disequilibrium (V) or assuming linkage equilibrium (V). After predicting the mean and the variance of each cross, we selected crosses based on mid-parent value, the top 10% of the progeny, and weighted mean and variance within progenies for grain yield, grain protein content, mixing time, and loaf volume in two applied wheat ( L.) breeding programs: Instituto Nacional de Investigación Agropecuaria (INIA) Uruguay and CIMMYT Mexico. Although the variance of the progeny is important to increase the chances of finding superior individuals from transgressive segregation, we observed that the mid-parent values of the crosses drove the genetic gain but the variance of the progeny had a small impact on genetic gain for grain yield. However, the relative importance of the variance of the progeny was larger for quality traits. Overall, the genomic resources and the statistical models are now available to plant breeders to predict both the performance of breeding lines per se as well as the value of progeny from any potential crosses. Copyright © 2017 Crop Science Society of America.
Age-specific survival of male golden-cheeked warblers on the Fort Hood Military Reservation, Texas
Duarte, Adam; Hines, James E.; Nichols, James D.; Hatfield, Jeffrey S.; Weckerly, Floyd W.
2014-01-01
Population models are essential components of large-scale conservation and management plans for the federally endangered Golden-cheeked Warbler (Setophaga chrysoparia; hereafter GCWA). However, existing models are based on vital rate estimates calculated using relatively small data sets that are now more than a decade old. We estimated more current, precise adult and juvenile apparent survival (Φ) probabilities and their associated variances for male GCWAs. In addition to providing estimates for use in population modeling, we tested hypotheses about spatial and temporal variation in Φ. We assessed whether a linear trend in Φ or a change in the overall mean Φ corresponded to an observed increase in GCWA abundance during 1992-2000 and if Φ varied among study plots. To accomplish these objectives, we analyzed long-term GCWA capture-resight data from 1992 through 2011, collected across seven study plots on the Fort Hood Military Reservation using a Cormack-Jolly-Seber model structure within program MARK. We also estimated Φ process and sampling variances using a variance-components approach. Our results did not provide evidence of site-specific variation in adult Φ on the installation. Because of a lack of data, we could not assess whether juvenile Φ varied spatially. We did not detect a strong temporal association between GCWA abundance and Φ. Mean estimates of Φ for adult and juvenile male GCWAs for all years analyzed were 0.47 with a process variance of 0.0120 and a sampling variance of 0.0113 and 0.28 with a process variance of 0.0076 and a sampling variance of 0.0149, respectively. Although juvenile Φ did not differ greatly from previous estimates, our adult Φ estimate suggests previous GCWA population models were overly optimistic with respect to adult survival. These updated Φ probabilities and their associated variances will be incorporated into new population models to assist with GCWA conservation decision making.
Empirical single sample quantification of bias and variance in Q-ball imaging.
Hainline, Allison E; Nath, Vishwesh; Parvathaneni, Prasanna; Blaber, Justin A; Schilling, Kurt G; Anderson, Adam W; Kang, Hakmook; Landman, Bennett A
2018-02-06
The bias and variance of high angular resolution diffusion imaging methods have not been thoroughly explored in the literature and may benefit from the simulation extrapolation (SIMEX) and bootstrap techniques to estimate bias and variance of high angular resolution diffusion imaging metrics. The SIMEX approach is well established in the statistics literature and uses simulation of increasingly noisy data to extrapolate back to a hypothetical case with no noise. The bias of calculated metrics can then be computed by subtracting the SIMEX estimate from the original pointwise measurement. The SIMEX technique has been studied in the context of diffusion imaging to accurately capture the bias in fractional anisotropy measurements in DTI. Herein, we extend the application of SIMEX and bootstrap approaches to characterize bias and variance in metrics obtained from a Q-ball imaging reconstruction of high angular resolution diffusion imaging data. The results demonstrate that SIMEX and bootstrap approaches provide consistent estimates of the bias and variance of generalized fractional anisotropy, respectively. The RMSE for the generalized fractional anisotropy estimates shows a 7% decrease in white matter and an 8% decrease in gray matter when compared with the observed generalized fractional anisotropy estimates. On average, the bootstrap technique results in SD estimates that are approximately 97% of the true variation in white matter, and 86% in gray matter. Both SIMEX and bootstrap methods are flexible, estimate population characteristics based on single scans, and may be extended for bias and variance estimation on a variety of high angular resolution diffusion imaging metrics. © 2018 International Society for Magnetic Resonance in Medicine.
Combining Study Outcome Measures Using Dominance Adjusted Weights
ERIC Educational Resources Information Center
Makambi, Kepher H.; Lu, Wenxin
2013-01-01
Weighting of studies in meta-analysis is usually implemented by using the estimated inverse variances of treatment effect estimates. However, there is a possibility of one study dominating other studies in the estimation process by taking on a weight that is above some upper limit. We implement an estimator of the heterogeneity variance that takes…
Estimation of genetic parameters for milk yield in Murrah buffaloes by Bayesian inference.
Breda, F C; Albuquerque, L G; Euclydes, R F; Bignardi, A B; Baldi, F; Torres, R A; Barbosa, L; Tonhati, H
2010-02-01
Random regression models were used to estimate genetic parameters for test-day milk yield in Murrah buffaloes using Bayesian inference. Data comprised 17,935 test-day milk records from 1,433 buffaloes. Twelve models were tested using different combinations of third-, fourth-, fifth-, sixth-, and seventh-order orthogonal polynomials of weeks of lactation for additive genetic and permanent environmental effects. All models included the fixed effects of contemporary group, number of daily milkings and age of cow at calving as covariate (linear and quadratic effect). In addition, residual variances were considered to be heterogeneous with 6 classes of variance. Models were selected based on the residual mean square error, weighted average of residual variance estimates, and estimates of variance components, heritabilities, correlations, eigenvalues, and eigenfunctions. Results indicated that changes in the order of fit for additive genetic and permanent environmental random effects influenced the estimation of genetic parameters. Heritability estimates ranged from 0.19 to 0.31. Genetic correlation estimates were close to unity between adjacent test-day records, but decreased gradually as the interval between test-days increased. Results from mean squared error and weighted averages of residual variance estimates suggested that a model considering sixth- and seventh-order Legendre polynomials for additive and permanent environmental effects, respectively, and 6 classes for residual variances, provided the best fit. Nevertheless, this model presented the largest degree of complexity. A more parsimonious model, with fourth- and sixth-order polynomials, respectively, for these same effects, yielded very similar genetic parameter estimates. Therefore, this last model is recommended for routine applications. Copyright 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Efficient Reduction and Analysis of Model Predictive Error
NASA Astrophysics Data System (ADS)
Doherty, J.
2006-12-01
Most groundwater models are calibrated against historical measurements of head and other system states before being used to make predictions in a real-world context. Through the calibration process, parameter values are estimated or refined such that the model is able to reproduce historical behaviour of the system at pertinent observation points reasonably well. Predictions made by the model are deemed to have greater integrity because of this. Unfortunately, predictive integrity is not as easy to achieve as many groundwater practitioners would like to think. The level of parameterisation detail estimable through the calibration process (especially where estimation takes place on the basis of heads alone) is strictly limited, even where full use is made of modern mathematical regularisation techniques such as those encapsulated in the PEST calibration package. (Use of these mechanisms allows more information to be extracted from a calibration dataset than is possible using simpler regularisation devices such as zones of piecewise constancy.) Where a prediction depends on aspects of parameterisation detail that are simply not inferable through the calibration process (which is often the case for predictions related to contaminant movement, and/or many aspects of groundwater/surface water interaction), then that prediction may be just as much in error as it would have been if the model had not been calibrated at all. Model predictive error arises from two sources. These are (a) the presence of measurement noise within the calibration dataset through which linear combinations of parameters spanning the "calibration solution space" are inferred, and (b) the sensitivity of the prediction to members of the "calibration null space" spanned by linear combinations of parameters which are not inferable through the calibration process. The magnitude of the former contribution depends on the level of measurement noise. The magnitude of the latter contribution (which often dominates the former) depends on the "innate variability" of hydraulic properties within the model domain. Knowledge of both of these is a prerequisite for characterisation of the magnitude of possible model predictive error. Unfortunately, in most cases, such knowledge is incomplete and subjective. Nevertheless, useful analysis of model predictive error can still take place. The present paper briefly discusses the means by which mathematical regularisation can be employed in the model calibration process in order to extract as much information as possible on hydraulic property heterogeneity prevailing within the model domain, thereby reducing predictive error to the lowest that can be achieved on the basis of that dataset. It then demonstrates the means by which predictive error variance can be quantified based on information supplied by the regularised inversion process. Both linear and nonlinear predictive error variance analysis is demonstrated using a number of real-world and synthetic examples.
Ménard, Richard; Deshaies-Jacques, Martin; Gasset, Nicolas
2016-09-01
An objective analysis is one of the main components of data assimilation. By combining observations with the output of a predictive model we combine the best features of each source of information: the complete spatial and temporal coverage provided by models, with a close representation of the truth provided by observations. The process of combining observations with a model output is called an analysis. To produce an analysis requires the knowledge of observation and model errors, as well as its spatial correlation. This paper is devoted to the development of methods of estimation of these error variances and the characteristic length-scale of the model error correlation for its operational use in the Canadian objective analysis system. We first argue in favor of using compact support correlation functions, and then introduce three estimation methods: the Hollingsworth-Lönnberg (HL) method in local and global form, the maximum likelihood method (ML), and the [Formula: see text] diagnostic method. We perform one-dimensional (1D) simulation studies where the error variance and true correlation length are known, and perform an estimation of both error variances and correlation length where both are non-uniform. We show that a local version of the HL method can capture accurately the error variances and correlation length at each observation site, provided that spatial variability is not too strong. However, the operational objective analysis requires only a single and globally valid correlation length. We examine whether any statistics of the local HL correlation lengths could be a useful estimate, or whether other global estimation methods such as by the global HL, ML, or [Formula: see text] should be used. We found in both 1D simulation and using real data that the ML method is able to capture physically significant aspects of the correlation length, while most other estimates give unphysical and larger length-scale values. This paper describes a proposed improvement of the objective analysis of surface pollutants at Environment and Climate Change Canada (formerly known as Environment Canada). Objective analyses are essentially surface maps of air pollutants that are obtained by combining observations with an air quality model output, and are thought to provide a complete and more accurate representation of the air quality. The highlight of this study is an analysis of methods to estimate the model (or background) error correlation length-scale. The error statistics are an important and critical component to the analysis scheme.
Supernovae as probes of cosmic parameters: estimating the bias from under-dense lines of sight
DOE Office of Scientific and Technical Information (OSTI.GOV)
Busti, V.C.; Clarkson, C.; Holanda, R.F.L., E-mail: vinicius.busti@uct.ac.za, E-mail: holanda@uepb.edu.br, E-mail: chris.clarkson@uct.ac.za
2013-11-01
Correctly interpreting observations of sources such as type Ia supernovae (SNe Ia) require knowledge of the power spectrum of matter on AU scales — which is very hard to model accurately. Because under-dense regions account for much of the volume of the universe, light from a typical source probes a mean density significantly below the cosmic mean. The relative sparsity of sources implies that there could be a significant bias when inferring distances of SNe Ia, and consequently a bias in cosmological parameter estimation. While the weak lensing approximation should in principle give the correct prediction for this, linear perturbationmore » theory predicts an effectively infinite variance in the convergence for ultra-narrow beams. We attempt to quantify the effect typically under-dense lines of sight might have in parameter estimation by considering three alternative methods for estimating distances, in addition to the usual weak lensing approximation. We find in each case this not only increases the errors in the inferred density parameters, but also introduces a bias in the posterior value.« less
Statistical analysis of the calibration procedure for personnel radiation measurement instruments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bush, W.J.; Bengston, S.J.; Kalbeitzer, F.L.
1980-11-01
Thermoluminescent analyzer (TLA) calibration procedures were used to estimate personnel radiation exposure levels at the Idaho National Engineering Laboratory (INEL). A statistical analysis is presented herein based on data collected over a six month period in 1979 on four TLA's located in the Department of Energy (DOE) Radiological and Environmental Sciences Laboratory at the INEL. The data were collected according to the day-to-day procedure in effect at that time. Both gamma and beta radiation models are developed. Observed TLA readings of thermoluminescent dosimeters are correlated with known radiation levels. This correlation is then used to predict unknown radiation doses frommore » future analyzer readings of personnel thermoluminescent dosimeters. The statistical techniques applied in this analysis include weighted linear regression, estimation of systematic and random error variances, prediction interval estimation using Scheffe's theory of calibration, the estimation of the ratio of the means of two normal bivariate distributed random variables and their corresponding confidence limits according to Kendall and Stuart, tests of normality, experimental design, a comparison between instruments, and quality control.« less
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
Estimating gene function with least squares nonnegative matrix factorization.
Wang, Guoli; Ochs, Michael F
2007-01-01
Nonnegative matrix factorization is a machine learning algorithm that has extracted information from data in a number of fields, including imaging and spectral analysis, text mining, and microarray data analysis. One limitation with the method for linking genes through microarray data in order to estimate gene function is the high variance observed in transcription levels between different genes. Least squares nonnegative matrix factorization uses estimates of the uncertainties on the mRNA levels for each gene in each condition, to guide the algorithm to a local minimum in normalized chi2, rather than a Euclidean distance or divergence between the reconstructed data and the data itself. Herein, application of this method to microarray data is demonstrated in order to predict gene function.
Alegría, Margarita; Kessler, Ronald C.; McLaughlin, Katie A.; Gruber, Michael J.; Sampson, Nancy A.; Zaslavsky, Alan M.
2014-01-01
We evaluate the precision of a model estimating school prevalence of SED using a small area estimation method based on readily-available predictors from area-level census block data and school principal questionnaires. Adolescents at 314 schools participated in the National Comorbidity Supplement, a national survey of DSM-IV disorders among adolescents. A multilevel model indicated that predictors accounted for under half of the variance in school-level SED and even less when considering block-group predictors or principal report alone. While Census measures and principal questionnaires are significant predictors of individual-level SED, associations are too weak to generate precise school-level predictions of SED prevalence. PMID:24740174
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
On the error in crop acreage estimation using satellite (LANDSAT) data
NASA Technical Reports Server (NTRS)
Chhikara, R. (Principal Investigator)
1983-01-01
The problem of crop acreage estimation using satellite data is discussed. Bias and variance of a crop proportion estimate in an area segment obtained from the classification of its multispectral sensor data are derived as functions of the means, variances, and covariance of error rates. The linear discriminant analysis and the class proportion estimation for the two class case are extended to include a third class of measurement units, where these units are mixed on ground. Special attention is given to the investigation of mislabeling in training samples and its effect on crop proportion estimation. It is shown that the bias and variance of the estimate of a specific crop acreage proportion increase as the disparity in mislabeling rates between two classes increases. Some interaction is shown to take place, causing the bias and the variance to decrease at first and then to increase, as the mixed unit class varies in size from 0 to 50 percent of the total area segment.
Procedures for estimating confidence intervals for selected method performance parameters.
McClure, F D; Lee, J K
2001-01-01
Procedures for estimating confidence intervals (CIs) for the repeatability variance (sigmar2), reproducibility variance (sigmaR2 = sigmaL2 + sigmar2), laboratory component (sigmaL2), and their corresponding standard deviations sigmar, sigmaR, and sigmaL, respectively, are presented. In addition, CIs for the ratio of the repeatability component to the reproducibility variance (sigmar2/sigmaR2) and the ratio of the laboratory component to the reproducibility variance (sigmaL2/sigmaR2) are also presented.
Wesseldijk, Laura W; Bartels, Meike; Vink, Jacqueline M; van Beijsterveldt, Catharina E M; Ligthart, Lannie; Boomsma, Dorret I; Middeldorp, Christel M
2017-06-21
Conduct problems in children and adolescents can predict antisocial personality disorder and related problems, such as crime and conviction. We sought an explanation for such predictions by performing a genetic longitudinal analysis. We estimated the effects of genetic, shared environmental, and unique environmental factors on variation in conduct problems measured at childhood and adolescence and antisocial personality problems measured at adulthood and on the covariation across ages. We also tested whether these estimates differed by sex. Longitudinal data were collected in the Netherlands Twin Register over a period of 27 years. Age appropriate and comparable measures of conduct and antisocial personality problems, assessed with the Achenbach System of Empirically Based Assessment, were available for 9783 9-10-year-old, 6839 13-18-year-old, and 7909 19-65-year-old twin pairs, respectively; 5114 twins have two or more assessments. At all ages, men scored higher than women. There were no sex differences in the estimates of the genetic and environmental influences. During childhood, genetic and environmental factors shared by children in families explained 43 and 44% of the variance of conduct problems, with the remaining variance due to unique environment. During adolescence and adulthood, genetic and unique environmental factors equally explained the variation. Longitudinal correlations across age varied between 0.20 and 0.38 and were mainly due to stable genetic factors. We conclude that shared environment is mainly of importance during childhood, while genetic factors contribute to variation in conduct and antisocial personality problems at all ages, and also underlie its stability over age.
Turner, Rebecca M; Jackson, Dan; Wei, Yinghui; Thompson, Simon G; Higgins, Julian P T
2015-01-01
Numerous meta-analyses in healthcare research combine results from only a small number of studies, for which the variance representing between-study heterogeneity is estimated imprecisely. A Bayesian approach to estimation allows external evidence on the expected magnitude of heterogeneity to be incorporated. The aim of this paper is to provide tools that improve the accessibility of Bayesian meta-analysis. We present two methods for implementing Bayesian meta-analysis, using numerical integration and importance sampling techniques. Based on 14 886 binary outcome meta-analyses in the Cochrane Database of Systematic Reviews, we derive a novel set of predictive distributions for the degree of heterogeneity expected in 80 settings depending on the outcomes assessed and comparisons made. These can be used as prior distributions for heterogeneity in future meta-analyses. The two methods are implemented in R, for which code is provided. Both methods produce equivalent results to standard but more complex Markov chain Monte Carlo approaches. The priors are derived as log-normal distributions for the between-study variance, applicable to meta-analyses of binary outcomes on the log odds-ratio scale. The methods are applied to two example meta-analyses, incorporating the relevant predictive distributions as prior distributions for between-study heterogeneity. We have provided resources to facilitate Bayesian meta-analysis, in a form accessible to applied researchers, which allow relevant prior information on the degree of heterogeneity to be incorporated. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:25475839
WHAT PREDICTS A SUCCESSFUL LIFE? A LIFE-COURSE MODEL OF WELL-BEING*
Layard, Richard; Clark, Andrew E.; Cornaglia, Francesca; Powdthavee, Nattavudh; Vernoit, James
2014-01-01
Policy-makers who care about well-being need a recursive model of how adult life-satisfaction is predicted by childhood influences, acting both directly and (indirectly) through adult circumstances. We estimate such a model using the British Cohort Study (1970). We show that the most powerful childhood predictor of adult life-satisfaction is the child’s emotional health, followed by the child’s conduct. The least powerful predictor is the child’s intellectual development. This may have implications for educational policy. Among adult circumstances, family income accounts for only 0.5% of the variance of life-satisfaction. Mental and physical health are much more important. PMID:25422527
Saviane, Chiara; Silver, R Angus
2006-06-15
Synapses play a crucial role in information processing in the brain. Amplitude fluctuations of synaptic responses can be used to extract information about the mechanisms underlying synaptic transmission and its modulation. In particular, multiple-probability fluctuation analysis can be used to estimate the number of functional release sites, the mean probability of release and the amplitude of the mean quantal response from fits of the relationship between the variance and mean amplitude of postsynaptic responses, recorded at different probabilities. To determine these quantal parameters, calculate their uncertainties and the goodness-of-fit of the model, it is important to weight the contribution of each data point in the fitting procedure. We therefore investigated the errors associated with measuring the variance by determining the best estimators of the variance of the variance and have used simulations of synaptic transmission to test their accuracy and reliability under different experimental conditions. For central synapses, which generally have a low number of release sites, the amplitude distribution of synaptic responses is not normal, thus the use of a theoretical variance of the variance based on the normal assumption is not a good approximation. However, appropriate estimators can be derived for the population and for limited sample sizes using a more general expression that involves higher moments and introducing unbiased estimators based on the h-statistics. Our results are likely to be relevant for various applications of fluctuation analysis when few channels or release sites are present.
Kim, Minjung; Lamont, Andrea E; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M Lee
2016-06-01
Regression mixture models are a novel approach to modeling the heterogeneous effects of predictors on an outcome. In the model-building process, often residual variances are disregarded and simplifying assumptions are made without thorough examination of the consequences. In this simulation study, we investigated the impact of an equality constraint on the residual variances across latent classes. We examined the consequences of constraining the residual variances on class enumeration (finding the true number of latent classes) and on the parameter estimates, under a number of different simulation conditions meant to reflect the types of heterogeneity likely to exist in applied analyses. The results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted on the estimated class sizes and showed the potential to greatly affect the parameter estimates in each class. These results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions are made.
An anthropometric model to estimate neonatal fat mass using air displacement plethysmography
2012-01-01
Background Current validated neonatal body composition methods are limited/impractical for use outside of a clinical setting because they are labor intensive, time consuming, and require expensive equipment. The purpose of this study was to develop an anthropometric model to estimate neonatal fat mass (kg) using an air displacement plethysmography (PEA POD® Infant Body Composition System) as the criterion. Methods A total of 128 healthy term infants, 60 females and 68 males, from a multiethnic cohort were included in the analyses. Gender, race/ethnicity, gestational age, age (in days), anthropometric measurements of weight, length, abdominal circumference, skin-fold thicknesses (triceps, biceps, sub scapular, and thigh), and body composition by PEA POD® were collected within 1-3 days of birth. Backward stepwise linear regression was used to determine the model that best predicted neonatal fat mass. Results The statistical model that best predicted neonatal fat mass (kg) was: -0.012 -0.064*gender + 0.024*day of measurement post-delivery -0.150*weight (kg) + 0.055*weight (kg)2 + 0.046*ethnicity + 0.020*sum of three skin-fold thicknesses (triceps, sub scapular, and thigh); R2 = 0.81, MSE = 0.08 kg. Conclusions Our anthropometric model explained 81% of the variance in neonatal fat mass. Future studies with a greater variety of neonatal anthropometric measurements may provide equations that explain more of the variance. PMID:22436534
Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger
2018-01-01
The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.
Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger
2018-01-01
The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200
Researches of fruit quality prediction model based on near infrared spectrum
NASA Astrophysics Data System (ADS)
Shen, Yulin; Li, Lian
2018-04-01
With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.
ERIC Educational Resources Information Center
Jackson, Dan; Bowden, Jack; Baker, Rose
2015-01-01
Moment-based estimators of the between-study variance are very popular when performing random effects meta-analyses. This type of estimation has many advantages including computational and conceptual simplicity. Furthermore, by using these estimators in large samples, valid meta-analyses can be performed without the assumption that the treatment…
Trends in Elevated Triglyceride in Adults: United States, 2001-2012
... All variance estimates accounted for the complex survey design using Taylor series linearization ( 10 ). Percentage estimates for the total adult ... al. National Health and Nutrition Examination Survey: Sample design, 2007–2010. ... KM. Taylor series methods. In: Introduction to variance estimation. 2nd ed. ...
Effects of social contact and zygosity on 21-y weight change in male twins.
McCaffery, Jeanne M; Franz, Carol E; Jacobson, Kristen; Leahey, Tricia M; Xian, Hong; Wing, Rena R; Lyons, Michael J; Kremen, William S
2011-08-01
Recent evidence indicates that social contact is related to similarities in weight gain over time. However, no studies have examined this effect in a twin design, in which genetic and other environmental effects can also be estimated. We determined whether the frequency of social contact is associated with similarity in weight change from young adulthood (mean age: 20 y) to middle age (mean age: 41 y) in twins and quantified the percentage of variance in weight change attributable to social contact, genetic factors, and other environmental influences. Participants were 1966 monozygotic and 1529 dizygotic male twin pairs from the Vietnam-Era Twin Registry. Regression models tested whether frequency of social contact and zygosity predicted twin pair similarity in body mass index (BMI) change and weight change. Twin modeling was used to partition the percentage variance attributable to social contact, genetic, and other environmental effects. Twins gained an average of 3.99 BMI units, or 13.23 kg (29.11 lb), over 21 y. In regression models, both zygosity (P < 0.001) and degree of social contact (P < 0.02) significantly predicted twin pair similarity in BMI change. In twin modeling, social contact between twins contributed 16% of the variance in BMI change (P < 0.001), whereas genetic factors contributed 42%, with no effect of additional shared environmental factors (1%). Similar results were obtained for weight change. Frequency of social contact significantly predicted twin pair similarity in BMI and weight change over 21 y, independent of zygosity and other shared environmental influences.
Evaluation of three lidar scanning strategies for turbulence measurements
NASA Astrophysics Data System (ADS)
Newman, J. F.; Klein, P. M.; Wharton, S.; Sathe, A.; Bonin, T. A.; Chilson, P. B.; Muschinski, A.
2015-11-01
Several errors occur when a traditional Doppler-beam swinging (DBS) or velocity-azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers. Results indicate that the six-beam strategy mitigates some of the errors caused by VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.
Evaluation of three lidar scanning strategies for turbulence measurements
NASA Astrophysics Data System (ADS)
Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia; Sathe, Ameya; Bonin, Timothy A.; Chilson, Phillip B.; Muschinski, Andreas
2016-05-01
Several errors occur when a traditional Doppler beam swinging (DBS) or velocity-azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused by VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.
Estimating acreage by double sampling using LANDSAT data
NASA Technical Reports Server (NTRS)
Pont, F.; Horwitz, H.; Kauth, R. (Principal Investigator)
1982-01-01
Double sampling techniques employing LANDSAT data for estimating the acreage of corn and soybeans was investigated and evaluated. The evaluation was based on estimated costs and correlations between two existing procedures having differing cost/variance characteristics, and included consideration of their individual merits when coupled with a fictional 'perfect' procedure of zero bias and variance. Two features of the analysis are: (1) the simultaneous estimation of two or more crops; and (2) the imposition of linear cost constraints among two or more types of resource. A reasonably realistic operational scenario was postulated. The costs were estimated from current experience with the measurement procedures involved, and the correlations were estimated from a set of 39 LACIE-type sample segments located in the U.S. Corn Belt. For a fixed variance of the estimate, double sampling with the two existing LANDSAT measurement procedures can result in a 25% or 50% cost reduction. Double sampling which included the fictional perfect procedure results in a more cost effective combination when it is used with the lower cost/higher variance representative of the existing procedures.
How predictable is the winter extremely cold days over temperate East Asia?
NASA Astrophysics Data System (ADS)
Luo, Xiao; Wang, Bin
2017-04-01
Skillful seasonal prediction of the number of extremely cold day (NECD) has considerable benefits for climate risk management and economic planning. Yet, predictability of NECD associated with East Asia winter monsoon remains largely unexplored. The present work estimates the NECD predictability in temperate East Asia (TEA, 30°-50°N, 110°-140°E) where the current dynamical models exhibit limited prediction skill. We show that about 50 % of the total variance of the NECD in TEA region is likely predictable, which is estimated by using a physics-based empirical (P-E) model with three consequential autumn predictors, i.e., developing El Niño/La Niña, Eurasian Arctic Ocean temperature anomalies, and geopotential height anomalies over northern and eastern Asia. We find that the barotropic geopotential height anomaly over Asia can persist from autumn to winter, thereby serving as a predictor for winter NECD. Further analysis reveals that the sources of the NECD predictability and the physical basis for prediction of NECD are essentially the same as those for prediction of winter mean temperature over the same region. This finding implies that forecasting seasonal mean temperature can provide useful information for prediction of extreme cold events. Interpretation of the lead-lag linkages between the three predictors and the predictand is provided for stimulating further studies.
Unbiased Estimates of Variance Components with Bootstrap Procedures
ERIC Educational Resources Information Center
Brennan, Robert L.
2007-01-01
This article provides general procedures for obtaining unbiased estimates of variance components for any random-model balanced design under any bootstrap sampling plan, with the focus on designs of the type typically used in generalizability theory. The results reported here are particularly helpful when the bootstrap is used to estimate standard…
Control Variate Estimators of Survivor Growth from Point Samples
Francis A. Roesch; Paul C. van Deusen
1993-01-01
Two estimators of the control variate type for survivor growth from remeasured point samples are proposed and compared with more familiar estimators. The large reductionsin variance, observed in many cases forestimators constructed with control variates, arealso realized in thisapplication. A simulation study yielded consistent reductions in variance which were often...
Bruning, Andrea; Gaitán-Espitia, Juan Diego; González, Avia; Bartheld, José Luis; Nespolo, Roberto F
2013-01-01
Life-history evolution-the way organisms allocate time and energy to reproduction, survival, and growth-is a central question in evolutionary biology. One of its main tenets, the allocation principle, predicts that selection will reduce energy costs of maintenance in order to divert energy to survival and reproduction. The empirical support for this principle is the existence of a negative relationship between fitness and metabolic rate, which has been observed in some ectotherms. In juvenile animals, a key function affecting fitness is growth rate, since fast growers will reproduce sooner and maximize survival. In principle, design constraints dictate that growth rate cannot be reduced without affecting maintenance costs. Hence, it is predicted that juveniles will show a positive relationship between fitness (growth rate) and metabolic rate, contrarily to what has been observed in adults. Here we explored this problem using land snails (Cornu aspersum). We estimated the additive genetic variance-covariance matrix for growth and standard metabolic rate (SMR; rate of CO2 production) using 34 half-sibling families. We measured eggs, hatchlings, and juveniles in 208 offspring that were isolated right after egg laying (i.e., minimizing maternal and common environmental variance). Surprisingly, our results showed that additive genetic effects (narrow-sense heritabilities, h(2)) and additive genetic correlations (rG) were small and nonsignificant. However, the nonadditive proportion of phenotypic variances and correlations (rC) were unexpectedly large and significant. In fact, nonadditive genetic effects were positive for growth rate and SMR ([Formula: see text]; [Formula: see text]), supporting the idea that fitness (growth rate) cannot be maximized without incurring maintenance costs. Large nonadditive genetic variances could result as a consequence of selection eroding the additive genetic component, which suggests that past selection could have produced nonadditive genetic correlation. It is predicted that this correlation is reduced when adulthood is attained and selection starts to promote the reduction in metabolic rate.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-04-26
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-01-01
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance. PMID:29701668
Meta-heuristic CRPS minimization for the calibration of short-range probabilistic forecasts
NASA Astrophysics Data System (ADS)
Mohammadi, Seyedeh Atefeh; Rahmani, Morteza; Azadi, Majid
2016-08-01
This paper deals with the probabilistic short-range temperature forecasts over synoptic meteorological stations across Iran using non-homogeneous Gaussian regression (NGR). NGR creates a Gaussian forecast probability density function (PDF) from the ensemble output. The mean of the normal predictive PDF is a bias-corrected weighted average of the ensemble members and its variance is a linear function of the raw ensemble variance. The coefficients for the mean and variance are estimated by minimizing the continuous ranked probability score (CRPS) during a training period. CRPS is a scoring rule for distributional forecasts. In the paper of Gneiting et al. (Mon Weather Rev 133:1098-1118, 2005), Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is used to minimize the CRPS. Since BFGS is a conventional optimization method with its own limitations, we suggest using the particle swarm optimization (PSO), a robust meta-heuristic method, to minimize the CRPS. The ensemble prediction system used in this study consists of nine different configurations of the weather research and forecasting model for 48-h forecasts of temperature during autumn and winter 2011 and 2012. The probabilistic forecasts were evaluated using several common verification scores including Brier score, attribute diagram and rank histogram. Results show that both BFGS and PSO find the optimal solution and show the same evaluation scores, but PSO can do this with a feasible random first guess and much less computational complexity.
Predicting vertical jump height from bar velocity.
García-Ramos, Amador; Štirn, Igor; Padial, Paulino; Argüelles-Cienfuegos, Javier; De la Fuente, Blanca; Strojnik, Vojko; Feriche, Belén
2015-06-01
The objective of the study was to assess the use of maximum (Vmax) and final propulsive phase (FPV) bar velocity to predict jump height in the weighted jump squat. FPV was defined as the velocity reached just before bar acceleration was lower than gravity (-9.81 m·s(-2)). Vertical jump height was calculated from the take-off velocity (Vtake-off) provided by a force platform. Thirty swimmers belonging to the National Slovenian swimming team performed a jump squat incremental loading test, lifting 25%, 50%, 75% and 100% of body weight in a Smith machine. Jump performance was simultaneously monitored using an AMTI portable force platform and a linear velocity transducer attached to the barbell. Simple linear regression was used to estimate jump height from the Vmax and FPV recorded by the linear velocity transducer. Vmax (y = 16.577x - 16.384) was able to explain 93% of jump height variance with a standard error of the estimate of 1.47 cm. FPV (y = 12.828x - 6.504) was able to explain 91% of jump height variance with a standard error of the estimate of 1.66 cm. Despite that both variables resulted to be good predictors, heteroscedasticity in the differences between FPV and Vtake-off was observed (r(2) = 0.307), while the differences between Vmax and Vtake-off were homogenously distributed (r(2) = 0.071). These results suggest that Vmax is a valid tool for estimating vertical jump height in a loaded jump squat test performed in a Smith machine. Key pointsVertical jump height in the loaded jump squat can be estimated with acceptable precision from the maximum bar velocity recorded by a linear velocity transducer.The relationship between the point at which bar acceleration is less than -9.81 m·s(-2) and the real take-off is affected by the velocity of movement.Mean propulsive velocity recorded by a linear velocity transducer does not appear to be optimal to monitor ballistic exercise performance.
Predicting Vertical Jump Height from Bar Velocity
García-Ramos, Amador; Štirn, Igor; Padial, Paulino; Argüelles-Cienfuegos, Javier; De la Fuente, Blanca; Strojnik, Vojko; Feriche, Belén
2015-01-01
The objective of the study was to assess the use of maximum (Vmax) and final propulsive phase (FPV) bar velocity to predict jump height in the weighted jump squat. FPV was defined as the velocity reached just before bar acceleration was lower than gravity (-9.81 m·s-2). Vertical jump height was calculated from the take-off velocity (Vtake-off) provided by a force platform. Thirty swimmers belonging to the National Slovenian swimming team performed a jump squat incremental loading test, lifting 25%, 50%, 75% and 100% of body weight in a Smith machine. Jump performance was simultaneously monitored using an AMTI portable force platform and a linear velocity transducer attached to the barbell. Simple linear regression was used to estimate jump height from the Vmax and FPV recorded by the linear velocity transducer. Vmax (y = 16.577x - 16.384) was able to explain 93% of jump height variance with a standard error of the estimate of 1.47 cm. FPV (y = 12.828x - 6.504) was able to explain 91% of jump height variance with a standard error of the estimate of 1.66 cm. Despite that both variables resulted to be good predictors, heteroscedasticity in the differences between FPV and Vtake-off was observed (r2 = 0.307), while the differences between Vmax and Vtake-off were homogenously distributed (r2 = 0.071). These results suggest that Vmax is a valid tool for estimating vertical jump height in a loaded jump squat test performed in a Smith machine. Key points Vertical jump height in the loaded jump squat can be estimated with acceptable precision from the maximum bar velocity recorded by a linear velocity transducer. The relationship between the point at which bar acceleration is less than -9.81 m·s-2 and the real take-off is affected by the velocity of movement. Mean propulsive velocity recorded by a linear velocity transducer does not appear to be optimal to monitor ballistic exercise performance. PMID:25983572
NASA Astrophysics Data System (ADS)
O'Connor, J. E.; Wise, D. R.; Mangano, J.; Jones, K.
2015-12-01
Empirical analyses of suspended sediment and bedload transport gives estimates of sediment flux for western Oregon and northwestern California. The estimates of both bedload and suspended load are from regression models relating measured annual sediment yield to geologic, physiographic, and climatic properties of contributing basins. The best models include generalized geology and either slope or precipitation. The best-fit suspended-sediment model is based on basin geology, precipitation, and area of recent wildfire. It explains 65% of the variance for 68 suspended sediment measurement sites within the model area. Predicted suspended sediment yields range from no yield from the High Cascades geologic province to 200 tonnes/ km2-yr in the northern Oregon Coast Range and 1000 tonnes/km2-yr in recently burned areas of the northern Klamath terrain. Bed-material yield is similarly estimated from a regression model based on 22 sites of measured bed-material transport, mostly from reservoir accumulation analyses but also from several bedload measurement programs. The resulting best-fit regression is based on basin slope and the presence/absence of the Klamath geologic terrane. For the Klamath terrane, bed-material yield is twice that of the other geologic provinces. This model explains more than 80% of the variance of the better-quality measurements. Predicted bed-material yields range up to 350 tonnes/ km2-yr in steep areas of the Klamath terrane. Applying these regressions to small individual watersheds (mean size; 66 km2 for bed-material; 3 km2 for suspended sediment) and cumulating totals down the hydrologic network (but also decreasing the bed-material flux by experimentally determined attrition rates) gives spatially explicit estimates of both bed-material and suspended sediment flux. This enables assessment of several management issues, including the effects of dams on bedload transport, instream gravel mining, habitat formation processes, and water-quality. The combined fluxes can also be compared to long-term rock uplift and cosmogenically determined landscape erosion rates.
Mapping from disease-specific measures to health-state utility values in individuals with migraine.
Gillard, Patrick J; Devine, Beth; Varon, Sepideh F; Liu, Lei; Sullivan, Sean D
2012-05-01
The objective of this study was to develop empirical algorithms that estimate health-state utility values from disease-specific quality-of-life scores in individuals with migraine. Data from a cross-sectional, multicountry study were used. Individuals with episodic and chronic migraine were randomly assigned to training or validation samples. Spearman's correlation coefficients between paired EuroQol five-dimensional (EQ-5D) questionnaire utility values and both Headache Impact Test (HIT-6) scores and Migraine-Specific Quality-of-Life Questionnaire version 2.1 (MSQ) domain scores (role restrictive, role preventive, and emotional function) were examined. Regression models were constructed to estimate EQ-5D questionnaire utility values from the HIT-6 score or the MSQ domain scores. Preferred algorithms were confirmed in the validation samples. In episodic migraine, the preferred HIT-6 and MSQ algorithms explained 22% and 25% of the variance (R(2)) in the training samples, respectively, and had similar prediction errors (root mean square errors of 0.30). In chronic migraine, the preferred HIT-6 and MSQ algorithms explained 36% and 45% of the variance in the training samples, respectively, and had similar prediction errors (root mean square errors 0.31 and 0.29). In episodic and chronic migraine, no statistically significant differences were observed between the mean observed and the mean estimated EQ-5D questionnaire utility values for the preferred HIT-6 and MSQ algorithms in the validation samples. The relationship between the EQ-5D questionnaire and the HIT-6 or the MSQ is adequate to use regression equations to estimate EQ-5D questionnaire utility values. The preferred HIT-6 and MSQ algorithms will be useful in estimating health-state utilities in migraine trials in which no preference-based measure is present. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
van Vugt, Floris T.; Tillmann, Barbara
2014-01-01
The human brain is able to predict the sensory effects of its actions. But how precise are these predictions? The present research proposes a tool to measure thresholds between a simple action (keystroke) and a resulting sound. On each trial, participants were required to press a key. Upon each keystroke, a woodblock sound was presented. In some trials, the sound came immediately with the downward keystroke; at other times, it was delayed by a varying amount of time. Participants were asked to verbally report whether the sound came immediately or was delayed. Participants' delay detection thresholds (in msec) were measured with a staircase-like procedure. We hypothesised that musicians would have a lower threshold than non-musicians. Comparing pianists and brass players, we furthermore hypothesised that, as a result of a sharper attack of the timbre of their instrument, pianists might have lower thresholds than brass players. Our results show that non-musicians exhibited higher thresholds for delay detection (180±104 ms) than the two groups of musicians (102±65 ms), but there were no differences between pianists and brass players. The variance in delay detection thresholds could be explained by variance in sensorimotor synchronisation capacities as well as variance in a purely auditory temporal irregularity detection measure. This suggests that the brain's capacity to generate temporal predictions of sensory consequences can be decomposed into general temporal prediction capacities together with auditory-motor coupling. These findings indicate that the brain has a relatively large window of integration within which an action and its resulting effect are judged as simultaneous. Furthermore, musical expertise may narrow this window down, potentially due to a more refined temporal prediction. This novel paradigm provides a simple test to estimate the temporal precision of auditory-motor action-effect coupling, and the paradigm can readily be incorporated in studies investigating both healthy and patient populations. PMID:24498299
Real-time yield estimation based on deep learning
NASA Astrophysics Data System (ADS)
Rahnemoonfar, Maryam; Sheppard, Clay
2017-05-01
Crop yield estimation is an important task in product management and marketing. Accurate yield prediction helps farmers to make better decision on cultivation practices, plant disease prevention, and the size of harvest labor force. The current practice of yield estimation based on the manual counting of fruits is very time consuming and expensive process and it is not practical for big fields. Robotic systems including Unmanned Aerial Vehicles (UAV) and Unmanned Ground Vehicles (UGV), provide an efficient, cost-effective, flexible, and scalable solution for product management and yield prediction. Recently huge data has been gathered from agricultural field, however efficient analysis of those data is still a challenging task. Computer vision approaches currently face diffident challenges in automatic counting of fruits or flowers including occlusion caused by leaves, branches or other fruits, variance in natural illumination, and scale. In this paper a novel deep convolutional network algorithm was developed to facilitate the accurate yield prediction and automatic counting of fruits and vegetables on the images. Our method is robust to occlusion, shadow, uneven illumination and scale. Experimental results in comparison to the state-of-the art show the effectiveness of our algorithm.
Lanier, Hayley C; Knowles, L Lacey
2015-02-01
Coalescent-based methods for species-tree estimation are becoming a dominant approach for reconstructing species histories from multi-locus data, with most of the studies examining these methodologies focused on recently diverged species. However, deeper phylogenies, such as the datasets that comprise many Tree of Life (ToL) studies, also exhibit gene-tree discordance. This discord may also arise from the stochastic sorting of gene lineages during the speciation process (i.e., reflecting the random coalescence of gene lineages in ancestral populations). It remains unknown whether guidelines regarding methodologies and numbers of loci established by simulation studies at shallow tree depths translate into accurate species relationships for deeper phylogenetic histories. We address this knowledge gap and specifically identify the challenges and limitations of species-tree methods that account for coalescent variance for deeper phylogenies. Using simulated data with characteristics informed by empirical studies, we evaluate both the accuracy of estimated species trees and the characteristics associated with recalcitrant nodes, with a specific focus on whether coalescent variance is generally responsible for the lack of resolution. By determining the proportion of coalescent genealogies that support a particular node, we demonstrate that (1) species-tree methods account for coalescent variance at deep nodes and (2) mutational variance - not gene-tree discord arising from the coalescent - posed the primary challenge for accurate reconstruction across the tree. For example, many nodes were accurately resolved despite predicted discord from the random coalescence of gene lineages and nodes with poor support were distributed across a range of depths (i.e., they were not restricted to a particular recent divergences). Given their broad taxonomic scope and large sampling of taxa, deep level phylogenies pose several potential methodological complications including difficulties with MCMC convergence and estimation of requisite population genetic parameters for coalescent-based approaches. Despite these difficulties, the findings generally support the utility of species-tree analyses for the estimation of species relationships throughout the ToL. We discuss strategies for successful application of species-tree approaches to deep phylogenies. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Nelson, Ross; Margolis, Hank; Montesano, Paul; Sun, Guoqing; Cook, Bruce; Corp, Larry; Andersen, Hans-Erik; DeJong, Ben; Pellat, Fernando Paz; Fickel, Thaddeus;
2016-01-01
Existing national forest inventory plots, an airborne lidar scanning (ALS) system, and a space profiling lidar system (ICESat-GLAS) are used to generate circa 2005 estimates of total aboveground dry biomass (AGB) in forest strata, by state, in the continental United States (CONUS) and Mexico. The airborne lidar is used to link ground observations of AGB to space lidar measurements. Two sets of models are generated, the first relating ground estimates of AGB to airborne laser scanning (ALS) measurements and the second set relating ALS estimates of AGB (generated using the first model set) to GLAS measurements. GLAS then, is used as a sampling tool within a hybrid estimation framework to generate stratum-, state-, and national-level AGB estimates. A two-phase variance estimator is employed to quantify GLAS sampling variability and, additively, ALS-GLAS model variability in this current, three-phase (ground-ALS-space lidar) study. The model variance component characterizes the variability of the regression coefficients used to predict ALS-based estimates of biomass as a function of GLAS measurements. Three different types of predictive models are considered in CONUS to determine which produced biomass totals closest to ground-based national forest inventory estimates - (1) linear (LIN), (2) linear-no-intercept (LNI), and (3) log-linear. For CONUS at the national level, the GLAS LNI model estimate (23.95 +/- 0.45 Gt AGB), agreed most closely with the US national forest inventory ground estimate, 24.17 +/- 0.06 Gt, i.e., within 1%. The national biomass total based on linear ground-ALS and ALS-GLAS models (25.87 +/- 0.49 Gt) overestimated the national ground-based estimate by 7.5%. The comparable log-linear model result (63.29 +/-1.36 Gt) overestimated ground results by 261%. All three national biomass GLAS estimates, LIN, LNI, and log-linear, are based on 241,718 pulses collected on 230 orbits. The US national forest inventory (ground) estimates are based on 119,414 ground plots. At the US state level, the average absolute value of the deviation of LNI GLAS estimates from the comparable ground estimate of total biomass was 18.8% (range: Oregon,-40.8% to North Dakota, 128.6%). Log-linear models produced gross overestimates in the continental US, i.e., N2.6x, and the use of this model to predict regional biomass using GLAS data in temperate, western hemisphere forests is not appropriate. The best model form, LNI, is used to produce biomass estimates in Mexico. The average biomass density in Mexican forests is 53.10 +/- 0.88 t/ha, and the total biomass for the country, given a total forest area of 688,096 sq km, is 3.65 +/- 0.06 Gt. In Mexico, our GLAS biomass total underestimated a 2005 FAO estimate (4.152 Gt) by 12% and overestimated a 2007/8 radar study's figure (3.06 Gt) by 19%.
Weigel, K A; Pralle, R S; Adams, H; Cho, K; Do, C; White, H M
2017-06-01
Hyperketonemia (HYK), a common early postpartum health disorder characterized by elevated blood concentrations of β-hydroxybutyrate (BHB), affects millions of dairy cows worldwide and leads to significant economic losses and animal welfare concerns. In this study, blood concentrations of BHB were assessed for 1,453 Holstein cows using electronic handheld meters at four time points between 5 and 18 days postpartum. Incidence rates of subclinical (1.2 ≤ maximum BHB ≤ 2.9 mmol/L) and clinical ketosis (maximum BHB ≥ 3.0 mmol/L) were 24.0 and 2.4%, respectively. Variance components, estimated breeding values, and predicted HYK phenotypes were computed on the original, square-root, and binary scales. Heritability estimates for HYK ranged from 0.058 to 0.072 in pedigree-based analyses, as compared to estimates that ranged from 0.071 to 0.093 when pedigrees were augmented with 60,671 single nucleotide polymorphism genotypes of 959 cows and 801 male ancestors. On average, predicted HYK phenotypes from the genome-enhanced analysis ranged from 0.55 mmol/L for first-parity cows in the best contemporary group to 1.40 mmol/L for fourth-parity cows in the worst contemporary group. Genome-enhanced predictions of HYK phenotypes were more closely associated with actual phenotypes than pedigree-based predictions in five-fold cross-validation, and transforming phenotypes to reduce skewness and kurtosis also improved predictive ability. This study demonstrates the feasibility of using repeated cowside measurement of blood BHB concentration in early lactation to construct a reference population that can be used to estimate HYK breeding values for genomic selection programmes and predict HYK phenotypes for genome-guided management decisions. © 2017 Blackwell Verlag GmbH.
The Linear Predictability of Sea Level: A Benchmark
NASA Astrophysics Data System (ADS)
Sonnewald, M.; Wunsch, C.; Heimbach, P.
2016-12-01
A benchmark of linear predictive skill of global sea level is presented, complimenting more complicated model studies of future predictive skill. Sea level is of great socioeconomic interest, as most of the worlds population live by the sea. Currently, the spread in model projections suggests poor predictive skill outside the seasonal cycle. We use 20 years of data from the ECCOv4 state estimate (1992-2012), assessing the variance attributable to the seasons and the linear predictability potential of the deseasoned component of sea level. The Northern Hemisphere has large regions where the seasons make up >90% of the variance, particularly in the western boundary current regions and zonal bands along the equator. The deaseasoned sea level is more dominant in the Southern Hemisphere, particularly in the Southern Ocean. We treat the deseasoned sea level as a weakly stationary random process, whose predictability is given by the covariance structure. Fitting an ARMA(n,m) model, we choose the order using the Akaike and Bayesian Information Criteria (AIC and BIC). The AIC is more appropriate, with generally higher orders chosen and offering slightly more predictive accuracy. Monthly detrended data shows skill generally of the order of a few months, with isolated regions of twelve months or more. With the trend, the predictive skill increases, particularly in the South Pacific. We assess the annually averaged data, although our time-series is too short to assess the variability. There is some predictive skill, which is enhanced if the trend is not removed. A major caveat of our approach is that we test and train our model on the same dataset due to the short duration of available data.
Brysbaert, Marc; Keuleers, Emmanuel; New, Boris
2011-01-01
In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies. PMID:21713191
Huang, Xiaobi; Elliott, Michael R.; Harlow, Siobán D.
2013-01-01
SUMMARY As women approach menopause, the patterns of their menstrual cycle lengths change. To study these changes, we need to jointly model both the mean and variability of cycle length. Our proposed model incorporates separate mean and variance change points for each woman and a hierarchical model to link them together, along with regression components to include predictors of menopausal onset such as age at menarche and parity. Additional complexity arises from the fact that the calendar data have substantial missingness due to hormone use, surgery, and failure to report. We integrate multiple imputation and time-to event modeling in a Bayesian estimation framework to deal with different forms of the missingness. Posterior predictive model checks are applied to evaluate the model fit. Our method successfully models patterns of women’s menstrual cycle trajectories throughout their late reproductive life and identifies change points for mean and variability of segment length, providing insight into the menopausal process. More generally, our model points the way toward increasing use of joint mean-variance models to predict health outcomes and better understand disease processes. PMID:24729638
Human preference for individual colors
NASA Astrophysics Data System (ADS)
Palmer, Stephen E.; Schloss, Karen B.
2010-02-01
Color preference is an important aspect of human behavior, but little is known about why people like some colors more than others. Recent results from the Berkeley Color Project (BCP) provide detailed measurements of preferences among 32 chromatic colors as well as other relevant aspects of color perception. We describe the fit of several color preference models, including ones based on cone outputs, color-emotion associations, and Palmer and Schloss's ecological valence theory. The ecological valence theory postulates that color serves an adaptive "steering' function, analogous to taste preferences, biasing organisms to approach advantageous objects and avoid disadvantageous ones. It predicts that people will tend to like colors to the extent that they like the objects that are characteristically that color, averaged over all such objects. The ecological valence theory predicts 80% of the variance in average color preference ratings from the Weighted Affective Valence Estimates (WAVEs) of correspondingly colored objects, much more variance than any of the other models. We also describe how hue preferences for single colors differ as a function of gender, expertise, culture, social institutions, and perceptual experience.
Inertial measurements of free-living activities: assessing mobility to predict falls.
Wang, Kejia; Lovell, Nigel H; Del Rosario, Michael B; Liu, Ying; Wang, Jingjing; Narayanan, Michael R; Brodie, Matthew A D; Delbaere, Kim; Menant, Jasmine; Lord, Stephen R; Redmond, Stephen J
2014-01-01
An exploratory analysis was conducted into how simple features, from acceleration at the lower back and ankle during simulated free-living walking, stair ascent and descent, correlate with age, the overall fall risk from a clinically validated Physiological Profile Assessment (PPA), and its sub-components. Inertial data were captured from 92 older adults aged 78-95 (42 female, mean age 84.1, standard deviation 3.9 years). The dominant frequency, peak width from Welch's power spectral density estimate, and signal variance along each axis, from each sensor location and for each activity were calculated. Several correlations were found between these features and the physiological risk factors. The strongest correlations were from the dominant frequency at the ankle along the mediolateral direction during stair ascent (Spearman's correlation coefficient p = - 0.45) with anterioposterior sway, and signal variance of the anterioposterior acceleration at the lower back during stair descent (p = - 0.45) with age. These findings should aid future attempts to classify activities and predict falls in older adults, based on true free-living data from a range of activities.
Deletion Diagnostics for the Generalised Linear Mixed Model with independent random effects
Ganguli, B.; Roy, S. Sen; Naskar, M.; Malloy, E. J.; Eisen, E. A.
2015-01-01
The Generalised Linear Mixed Model (GLMM) is widely used for modelling environmental data. However, such data are prone to influential observations which can distort the estimated exposure-response curve particularly in regions of high exposure. Deletion diagnostics for iterative estimation schemes commonly derive the deleted estimates based on a single iteration of the full system holding certain pivotal quantities such as the information matrix to be constant. In this paper, we present an approximate formula for the deleted estimates and Cook’s distance for the GLMM which does not assume that the estimates of variance parameters are unaffected by deletion. The procedure allows the user to calculate standardised DFBETAs for mean as well as variance parameters. In certain cases, such as when using the GLMM as a device for smoothing, such residuals for the variance parameters are interesting in their own right. In general, the procedure leads to deleted estimates of mean parameters which are corrected for the effect of deletion on variance components as estimation of the two sets of parameters is interdependent. The probabilistic behaviour of these residuals is investigated and a simulation based procedure suggested for their standardisation. The method is used to identify influential individuals in an occupational cohort exposed to silica. The results show that failure to conduct post model fitting diagnostics for variance components can lead to erroneous conclusions about the fitted curve and unstable confidence intervals. PMID:26626135
Spatial Prediction and Optimized Sampling Design for Sodium Concentration in Groundwater
Shabbir, Javid; M. AbdEl-Salam, Nasser; Hussain, Tajammal
2016-01-01
Sodium is an integral part of water, and its excessive amount in drinking water causes high blood pressure and hypertension. In the present paper, spatial distribution of sodium concentration in drinking water is modeled and optimized sampling designs for selecting sampling locations is calculated for three divisions in Punjab, Pakistan. Universal kriging and Bayesian universal kriging are used to predict the sodium concentrations. Spatial simulated annealing is used to generate optimized sampling designs. Different estimation methods (i.e., maximum likelihood, restricted maximum likelihood, ordinary least squares, and weighted least squares) are used to estimate the parameters of the variogram model (i.e, exponential, Gaussian, spherical and cubic). It is concluded that Bayesian universal kriging fits better than universal kriging. It is also observed that the universal kriging predictor provides minimum mean universal kriging variance for both adding and deleting locations during sampling design. PMID:27683016
The development and evaluation of accident predictive models
NASA Astrophysics Data System (ADS)
Maleck, T. L.
1980-12-01
A mathematical model that will predict the incremental change in the dependent variables (accident types) resulting from changes in the independent variables is developed. The end product is a tool for estimating the expected number and type of accidents for a given highway segment. The data segments (accidents) are separated in exclusive groups via a branching process and variance is further reduced using stepwise multiple regression. The standard error of the estimate is calculated for each model. The dependent variables are the frequency, density, and rate of 18 types of accidents among the independent variables are: district, county, highway geometry, land use, type of zone, speed limit, signal code, type of intersection, number of intersection legs, number of turn lanes, left-turn control, all-red interval, average daily traffic, and outlier code. Models for nonintersectional accidents did not fit nor validate as well as models for intersectional accidents.
Using Latent Class Analysis to Model Temperament Types.
Loken, Eric
2004-10-01
Mixture models are appropriate for data that arise from a set of qualitatively different subpopulations. In this study, latent class analysis was applied to observational data from a laboratory assessment of infant temperament at four months of age. The EM algorithm was used to fit the models, and the Bayesian method of posterior predictive checks was used for model selection. Results show at least three types of infant temperament, with patterns consistent with those identified by previous researchers who classified the infants using a theoretically based system. Multiple imputation of group memberships is proposed as an alternative to assigning subjects to the latent class with maximum posterior probability in order to reflect variance due to uncertainty in the parameter estimation. Latent class membership at four months of age predicted longitudinal outcomes at four years of age. The example illustrates issues relevant to all mixture models, including estimation, multi-modality, model selection, and comparisons based on the latent group indicators.
Physical activity measurement in older adults: relationships with mental health.
Parker, Sarah J; Strath, Scott J; Swartz, Ann M
2008-10-01
This study examined the relationship between physical activity (PA) and mental health among older adults as measured by objective and subjective PA-assessment instruments. Pedometers (PED), accelerometers (ACC), and the Physical Activity Scale for the Elderly (PASE) were administered to measure 1 week of PA among 84 adults age 55-87 (mean = 71) years. General mental health was measured using the Positive and Negative Affect Scale (PANAS) and the Satisfaction With Life Scale (SWL). Linear regressions revealed that PA estimated by PED significantly predicted 18.1%, 8.3%, and 12.3% of variance in SWL and positive and negative affect, respectively, whereas PA estimated by the PASE did not predict any mental health variables. Results from ACC data were mixed. Hotelling-William tests between correlation coefficients revealed that the relationship between PED and SWL was significantly stronger than the relationship between PASE and SWL. Relationships between PA and mental health might depend on the PA measure used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newman, Jennifer F.; Clifton, Andrew
Currently, cup anemometers on meteorological towers are used to measure wind speeds and turbulence intensity to make decisions about wind turbine class and site suitability; however, as modern turbine hub heights increase and wind energy expands to complex and remote sites, it becomes more difficult and costly to install meteorological towers at potential sites. As a result, remote-sensing devices (e.g., lidars) are now commonly used by wind farm managers and researchers to estimate the flow field at heights spanned by a turbine. Although lidars can accurately estimate mean wind speeds and wind directions, there is still a large amount ofmore » uncertainty surrounding the measurement of turbulence using these devices. Errors in lidar turbulence estimates are caused by a variety of factors, including instrument noise, volume averaging, and variance contamination, in which the magnitude of these factors is highly dependent on measurement height and atmospheric stability. As turbulence has a large impact on wind power production, errors in turbulence measurements will translate into errors in wind power prediction. The impact of using lidars rather than cup anemometers for wind power prediction must be understood if lidars are to be considered a viable alternative to cup anemometers.In this poster, the sensitivity of power prediction error to typical lidar turbulence measurement errors is assessed. Turbulence estimates from a vertically profiling WINDCUBE v2 lidar are compared to high-resolution sonic anemometer measurements at field sites in Oklahoma and Colorado to determine the degree of lidar turbulence error that can be expected under different atmospheric conditions. These errors are then incorporated into a power prediction model to estimate the sensitivity of power prediction error to turbulence measurement error. Power prediction models, including the standard binning method and a random forest method, were developed using data from the aeroelastic simulator FAST for a 1.5 MW turbine. The impact of lidar turbulence error on the predicted power from these different models is examined to determine the degree of turbulence measurement accuracy needed for accurate power prediction.« less
Branscum, Paul; Sharma, Manoj
2014-01-01
The purpose of this study was to use the theory of planned behavior to explain two types of snack food consumption among boys and girls (girls n = 98; boys n = 69), which may have implications for future theory-based health promotion interventions. Between genders, there was a significant difference for calorie-dense/nutrient-poor snacks (p = .002), but no difference for fruit and vegetable snacks. Using stepwise multiple regression, attitudes, perceived behavioral control, and subjective norms accounted for a large amount of the variance of intentions (girls = 43.3%; boys = 55.9%); however, for girls, subjective norms accounted for the most variance, whereas for boys, attitudes accounted for the most variance. Calories from calorie-dense/nutrient-poor snacks and fruit and vegetable snacks were also predicted by intentions. For boys, intentions predicted 6.4% of the variance for fruit and vegetable snacks (p = .03) but was not significant for calorie-dense/nutrient-poor snacks, whereas for girls, intentions predicted 6.0% of the variance for fruit and vegetable snacks (p = .007), and 7.2% of the variance for calorie-dense/nutrient-poor snacks (p = .004). Results suggest that the theory of planned behavior is a useful framework for predicting snack foods among children; however, there are important differences between genders that should be considered in future health promotion interventions.
On the estimation variance for the specific Euler-Poincaré characteristic of random networks.
Tscheschel, A; Stoyan, D
2003-07-01
The specific Euler number is an important topological characteristic in many applications. It is considered here for the case of random networks, which may appear in microscopy either as primary objects of investigation or as secondary objects describing in an approximate way other structures such as, for example, porous media. For random networks there is a simple and natural estimator of the specific Euler number. For its estimation variance, a simple Poisson approximation is given. It is based on the general exact formula for the estimation variance. In two examples of quite different nature and topology application of the formulas is demonstrated.
An empirical analysis of the distribution of overshoots in a stationary Gaussian stochastic process
NASA Technical Reports Server (NTRS)
Carter, M. C.; Madison, M. W.
1973-01-01
The frequency distribution of overshoots in a stationary Gaussian stochastic process is analyzed. The primary processes involved in this analysis are computer simulation and statistical estimation. Computer simulation is used to simulate stationary Gaussian stochastic processes that have selected autocorrelation functions. An analysis of the simulation results reveals a frequency distribution for overshoots with a functional dependence on the mean and variance of the process. Statistical estimation is then used to estimate the mean and variance of a process. It is shown that for an autocorrelation function, the mean and the variance for the number of overshoots, a frequency distribution for overshoots can be estimated.
NASA Astrophysics Data System (ADS)
Kitterød, Nils-Otto
2017-08-01
Unconsolidated sediment cover thickness (D) above bedrock was estimated by using a publicly available well database from Norway, GRANADA. General challenges associated with such databases typically involve clustering and bias. However, if information about the horizontal distance to the nearest bedrock outcrop (L) is included, does the spatial estimation of D improve? This idea was tested by comparing two cross-validation results: ordinary kriging (OK) where L was disregarded; and co-kriging (CK) where cross-covariance between D and L was included. The analysis showed only minor differences between OK and CK with respect to differences between estimation and true values. However, the CK results gave in general less estimation variance compared to the OK results. All observations were declustered and transformed to standard normal probability density functions before estimation and back-transformed for the cross-validation analysis. The semivariogram analysis gave correlation lengths for D and L of approx. 10 and 6 km. These correlations reduce the estimation variance in the cross-validation analysis because more than 50 % of the data material had two or more observations within a radius of 5 km. The small-scale variance of D, however, was about 50 % of the total variance, which gave an accuracy of less than 60 % for most of the cross-validation cases. Despite the noisy character of the observations, the analysis demonstrated that L can be used as secondary information to reduce the estimation variance of D.
Zemski, Adam J; Broad, Elizabeth M; Slater, Gary J
2018-01-01
Body composition in elite rugby union athletes is routinely assessed using surface anthropometry, which can be utilized to provide estimates of absolute body composition using regression equations. This study aims to assess the ability of available skinfold equations to estimate body composition in elite rugby union athletes who have unique physique traits and divergent ethnicity. The development of sport-specific and ethnicity-sensitive equations was also pursued. Forty-three male international Australian rugby union athletes of Caucasian and Polynesian descent underwent surface anthropometry and dual-energy X-ray absorptiometry (DXA) assessment. Body fat percent (BF%) was estimated using five previously developed equations and compared to DXA measures. Novel sport and ethnicity-sensitive prediction equations were developed using forward selection multiple regression analysis. Existing skinfold equations provided unsatisfactory estimates of BF% in elite rugby union athletes, with all equations demonstrating a 95% prediction interval in excess of 5%. The equations tended to underestimate BF% at low levels of adiposity, whilst overestimating BF% at higher levels of adiposity, regardless of ethnicity. The novel equations created explained a similar amount of variance to those previously developed (Caucasians 75%, Polynesians 90%). The use of skinfold equations, including the created equations, cannot be supported to estimate absolute body composition. Until a population-specific equation is established that can be validated to precisely estimate body composition, it is advocated to use a proven method, such as DXA, when absolute measures of lean and fat mass are desired, and raw anthropometry data routinely to derive an estimate of body composition change.
Yeh, Hsiang J.; Guindani, Michele; Vannucci, Marina; Haneef, Zulfi; Stern, John M.
2018-01-01
Estimation of functional connectivity (FC) has become an increasingly powerful tool for investigating healthy and abnormal brain function. Static connectivity, in particular, has played a large part in guiding conclusions from the majority of resting-state functional MRI studies. However, accumulating evidence points to the presence of temporal fluctuations in FC, leading to increasing interest in estimating FC as a dynamic quantity. One central issue that has arisen in this new view of connectivity is the dramatic increase in complexity caused by dynamic functional connectivity (dFC) estimation. To computationally handle this increased complexity, a limited set of dFC properties, primarily the mean and variance, have generally been considered. Additionally, it remains unclear how to integrate the increased information from dFC into pattern recognition techniques for subject-level prediction. In this study, we propose an approach to address these two issues based on a large number of previously unexplored temporal and spectral features of dynamic functional connectivity. A Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model is used to estimate time-varying patterns of functional connectivity between resting-state networks. Time-frequency analysis is then performed on dFC estimates, and a large number of previously unexplored temporal and spectral features drawn from signal processing literature are extracted for dFC estimates. We apply the investigated features to two neurologic populations of interest, healthy controls and patients with temporal lobe epilepsy, and show that the proposed approach leads to substantial increases in predictive performance compared to both traditional estimates of static connectivity as well as current approaches to dFC. Variable importance is assessed and shows that there are several quantities that can be extracted from dFC signal which are more informative than the traditional mean or variance of dFC. This work illuminates many previously unexplored facets of the dynamic properties of functional connectivity between resting-state networks, and provides a platform for dynamic functional connectivity analysis that facilitates its usage as an investigative measure for healthy as well as abnormal brain function. PMID:29320526
NASA Astrophysics Data System (ADS)
Alexander, R. B.; Boyer, E. W.; Schwarz, G. E.; Smith, R. A.
2013-12-01
Estimating water and material stores and fluxes in watershed studies is frequently complicated by uncertainties in quantifying hydrological and biogeochemical effects of factors such as land use, soils, and climate. Although these process-related effects are commonly measured and modeled in separate catchments, researchers are especially challenged by their complexity across catchments and diverse environmental settings, leading to a poor understanding of how model parameters and prediction uncertainties vary spatially. To address these concerns, we illustrate the use of Bayesian hierarchical modeling techniques with a dynamic version of the spatially referenced watershed model SPARROW (SPAtially Referenced Regression On Watershed attributes). The dynamic SPARROW model is designed to predict streamflow and other water cycle components (e.g., evapotranspiration, soil and groundwater storage) for monthly varying hydrological regimes, using mechanistic functions, mass conservation constraints, and statistically estimated parameters. In this application, the model domain includes nearly 30,000 NHD (National Hydrologic Data) stream reaches and their associated catchments in the Susquehanna River Basin. We report the results of our comparisons of alternative models of varying complexity, including models with different explanatory variables as well as hierarchical models that account for spatial and temporal variability in model parameters and variance (error) components. The model errors are evaluated for changes with season and catchment size and correlations in time and space. The hierarchical models consist of a two-tiered structure in which climate forcing parameters are modeled as random variables, conditioned on watershed properties. Quantification of spatial and temporal variations in the hydrological parameters and model uncertainties in this approach leads to more efficient (lower variance) and less biased model predictions throughout the river network. Moreover, predictions of water-balance components are reported according to probabilistic metrics (e.g., percentiles, prediction intervals) that include both parameter and model uncertainties. These improvements in predictions of streamflow dynamics can inform the development of more accurate predictions of spatial and temporal variations in biogeochemical stores and fluxes (e.g., nutrients and carbon) in watersheds.
Genomic selection for crossbred performance accounting for breed-specific effects.
Lopes, Marcos S; Bovenhuis, Henk; Hidalgo, André M; van Arendonk, Johan A M; Knol, Egbert F; Bastiaansen, John W M
2017-06-26
Breed-specific effects are observed when the same allele of a given genetic marker has a different effect depending on its breed origin, which results in different allele substitution effects across breeds. In such a case, single-breed breeding values may not be the most accurate predictors of crossbred performance. Our aim was to estimate the contribution of alleles from each parental breed to the genetic variance of traits that are measured in crossbred offspring, and to compare the prediction accuracies of estimated direct genomic values (DGV) from a traditional genomic selection model (GS) that are trained on purebred or crossbred data, with accuracies of DGV from a model that accounts for breed-specific effects (BS), trained on purebred or crossbred data. The final dataset was composed of 924 Large White, 924 Landrace and 924 two-way cross (F1) genotyped and phenotyped animals. The traits evaluated were litter size (LS) and gestation length (GL) in pigs. The genetic correlation between purebred and crossbred performance was higher than 0.88 for both LS and GL. For both traits, the additive genetic variance was larger for alleles inherited from the Large White breed compared to alleles inherited from the Landrace breed (0.74 and 0.56 for LS, and 0.42 and 0.40 for GL, respectively). The highest prediction accuracies of crossbred performance were obtained when training was done on crossbred data. For LS, prediction accuracies were the same for GS and BS DGV (0.23), while for GL, prediction accuracy for BS DGV was similar to the accuracy of GS DGV (0.53 and 0.52, respectively). In this study, training on crossbred data resulted in higher prediction accuracy than training on purebred data and evidence of breed-specific effects for LS and GL was demonstrated. However, when training was done on crossbred data, both GS and BS models resulted in similar prediction accuracies. In future studies, traits with a lower genetic correlation between purebred and crossbred performance should be included to further assess the value of the BS model in genomic predictions.
Edwards, Rufus D; Smith, Kirk R; Zhang, Junfeng; Ma, Yuqing
2003-01-01
Residential energy use in developing countries has traditionally been associated with combustion devices of poor energy efficiency, which have been shown to produce substantial health-damaging pollution, contributing significantly to the global burden of disease, and greenhouse gas (GHG) emissions. Precision of these estimates in China has been hampered by limited data on stove use and fuel consumption in residences. In addition limited information is available on variability of emissions of pollutants from different stove/fuel combinations in typical use, as measurement of emission factors requires measurement of multiple chemical species in complex burn cycle tests. Such measurements are too costly and time consuming for application in conjunction with national surveys. Emissions of most of the major health-damaging pollutants (HDP) and many of the gases that contribute to GHG emissions from cooking stoves are the result of the significant portion of fuel carbon that is diverted to products of incomplete combustion (PIC) as a result of poor combustion efficiencies. The approximately linear increase in emissions of PIC with decreasing combustion efficiencies allows development of linear models to predict emissions of GHG and HDP intrinsically linked to CO2 and PIC production, and ultimately allows the prediction of global warming contributions from residential stove emissions. A comprehensive emissions database of three burn cycles of 23 typical fuel/stove combinations tested in a simulated village house in China has been used to develop models to predict emissions of HDP and global warming commitment (GWC) from cooking stoves in China, that rely on simple survey information on stove and fuel use that may be incorporated into national surveys. Stepwise regression models predicted 66% of the variance in global warming commitment (CO2, CO, CH4, NOx, TNMHC) per 1 MJ delivered energy due to emissions from these stoves if survey information on fuel type was available. Subsequently if stove type is known, stepwise regression models predicted 73% of the variance. Integrated assessment of policies to change stove or fuel type requires that implications for environmental impacts, energy efficiency, global warming and human exposures to HDP emissions can be evaluated. Frequently, this involves measurement of TSP or CO as the major HDPs. Incorporation of this information into models to predict GWC predicted 79% and 78% of the variance respectively. Clearly, however, the complexity of making multiple measurements in conjunction with a national survey would be both expensive and time consuming. Thus, models to predict HDP using simple survey information, and with measurement of either CO/CO2 or TSP/CO2 to predict emission factors for the other HDP have been derived. Stepwise regression models predicted 65% of the variance in emissions of total suspended particulate as grams of carbon (TSPC) per 1 MJ delivered if survey information on fuel and stove type was available and 74% if the CO/CO2 ratio was measured. Similarly stepwise regression models predicted 76% of the variance in COC emissions per MJ delivered with survey information on stove and fuel type and 85% if the TSPC/CO2 ratio was measured. Ultimately, with international agreements on emissions trading frameworks, similar models based on extensive databases of the fate of fuel carbon during combustion from representative household stoves would provide a mechanism for computing greenhouse credits in the residential sector as part of clean development mechanism frameworks and monitoring compliance to control regimes.
Predicting Bradycardia in Preterm Infants Using Point Process Analysis of Heart Rate.
Gee, Alan H; Barbieri, Riccardo; Paydarfar, David; Indic, Premananda
2017-09-01
Episodes of bradycardia are common and recur sporadically in preterm infants, posing a threat to the developing brain and other vital organs. We hypothesize that bradycardias are a result of transient temporal destabilization of the cardiac autonomic control system and that fluctuations in the heart rate signal might contain information that precedes bradycardia. We investigate infant heart rate fluctuations with a novel application of point process theory. In ten preterm infants, we estimate instantaneous linear measures of the heart rate signal, use these measures to extract statistical features of bradycardia, and propose a simplistic framework for prediction of bradycardia. We present the performance of a prediction algorithm using instantaneous linear measures (mean area under the curve = 0.79 ± 0.018) for over 440 bradycardia events. The algorithm achieves an average forecast time of 116 s prior to bradycardia onset (FPR = 0.15). Our analysis reveals that increased variance in the heart rate signal is a precursor of severe bradycardia. This increase in variance is associated with an increase in power from low content dynamics in the LF band (0.04-0.2 Hz) and lower multiscale entropy values prior to bradycardia. Point process analysis of the heartbeat time series reveals instantaneous measures that can be used to predict infant bradycardia prior to onset. Our findings are relevant to risk stratification, predictive monitoring, and implementation of preventative strategies for reducing morbidity and mortality associated with bradycardia in neonatal intensive care units.
May, Philip A; Tabachnick, Barbara G; Gossage, J Phillip; Kalberg, Wendy O; Marais, Anna-Susan; Robinson, Luther K; Manning, Melanie; Buckley, David; Hoyme, H Eugene
2011-12-01
Previous research in South Africa revealed very high rates of fetal alcohol syndrome (FAS), of 46-89 per 1000 among young children. Maternal and child data from studies in this community summarize the multiple predictors of FAS and partial fetal alcohol syndrome (PFAS). Sequential regression was employed to examine influences on child physical characteristics and dysmorphology from four categories of maternal traits: physical, demographic, childbearing, and drinking. Then, a structural equation model (SEM) was constructed to predict influences on child physical characteristics. Individual sequential regressions revealed that maternal drinking measures were the most powerful predictors of a child's physical anomalies (R² = .30, p < .001), followed by maternal demographics (R² = .24, p < .001), maternal physical characteristics (R²=.15, p < .001), and childbearing variables (R² = .06, p < .001). The SEM utilized both individual variables and the four composite categories of maternal traits to predict a set of child physical characteristics, including a total dysmorphology score. As predicted, drinking behavior is a relatively strong predictor of child physical characteristics (β = 0.61, p < .001), even when all other maternal risk variables are included; higher levels of drinking predict child physical anomalies. Overall, the SEM model explains 62% of the variance in child physical anomalies. As expected, drinking variables explain the most variance. But this highly controlled estimation of multiple effects also reveals a significant contribution played by maternal demographics and, to a lesser degree, maternal physical and childbearing variables. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Optimal distribution of integration time for intensity measurements in Stokes polarimetry.
Li, Xiaobo; Liu, Tiegen; Huang, Bingjing; Song, Zhanjie; Hu, Haofeng
2015-10-19
We consider the typical Stokes polarimetry system, which performs four intensity measurements to estimate a Stokes vector. We show that if the total integration time of intensity measurements is fixed, the variance of the Stokes vector estimator depends on the distribution of the integration time at four intensity measurements. Therefore, by optimizing the distribution of integration time, the variance of the Stokes vector estimator can be decreased. In this paper, we obtain the closed-form solution of the optimal distribution of integration time by employing Lagrange multiplier method. According to the theoretical analysis and real-world experiment, it is shown that the total variance of the Stokes vector estimator can be significantly decreased about 40% in the case discussed in this paper. The method proposed in this paper can effectively decrease the measurement variance and thus statistically improves the measurement accuracy of the polarimetric system.
Li, Xiaobo; Hu, Haofeng; Liu, Tiegen; Huang, Bingjing; Song, Zhanjie
2016-04-04
We consider the degree of linear polarization (DOLP) polarimetry system, which performs two intensity measurements at orthogonal polarization states to estimate DOLP. We show that if the total integration time of intensity measurements is fixed, the variance of the DOLP estimator depends on the distribution of integration time for two intensity measurements. Therefore, by optimizing the distribution of integration time, the variance of the DOLP estimator can be decreased. In this paper, we obtain the closed-form solution of the optimal distribution of integration time in an approximate way by employing Delta method and Lagrange multiplier method. According to the theoretical analyses and real-world experiments, it is shown that the variance of the DOLP estimator can be decreased for any value of DOLP. The method proposed in this paper can effectively decrease the measurement variance and thus statistically improve the measurement accuracy of the polarimetry system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beer, M.
1980-12-01
The maximum likelihood method for the multivariate normal distribution is applied to the case of several individual eigenvalues. Correlated Monte Carlo estimates of the eigenvalue are assumed to follow this prescription and aspects of the assumption are examined. Monte Carlo cell calculations using the SAM-CE and VIM codes for the TRX-1 and TRX-2 benchmark reactors, and SAM-CE full core results are analyzed with this method. Variance reductions of a few percent to a factor of 2 are obtained from maximum likelihood estimation as compared with the simple average and the minimum variance individual eigenvalue. The numerical results verify that themore » use of sample variances and correlation coefficients in place of the corresponding population statistics still leads to nearly minimum variance estimation for a sufficient number of histories and aggregates.« less
Estimation variance bounds of importance sampling simulations in digital communication systems
NASA Technical Reports Server (NTRS)
Lu, D.; Yao, K.
1991-01-01
In practical applications of importance sampling (IS) simulation, two basic problems are encountered, that of determining the estimation variance and that of evaluating the proper IS parameters needed in the simulations. The authors derive new upper and lower bounds on the estimation variance which are applicable to IS techniques. The upper bound is simple to evaluate and may be minimized by the proper selection of the IS parameter. Thus, lower and upper bounds on the improvement ratio of various IS techniques relative to the direct Monte Carlo simulation are also available. These bounds are shown to be useful and computationally simple to obtain. Based on the proposed technique, one can readily find practical suboptimum IS parameters. Numerical results indicate that these bounding techniques are useful for IS simulations of linear and nonlinear communication systems with intersymbol interference in which bit error rate and IS estimation variances cannot be obtained readily using prior techniques.
Calibrating SALT: a sampling scheme to improve estimates of suspended sediment yield
Robert B. Thomas
1986-01-01
Abstract - SALT (Selection At List Time) is a variable probability sampling scheme that provides unbiased estimates of suspended sediment yield and its variance. SALT performs better than standard schemes which are estimate variance. Sampling probabilities are based on a sediment rating function which promotes greater sampling intensity during periods of high...
Evaluation of three lidar scanning strategies for turbulence measurements
Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia; ...
2016-05-03
Several errors occur when a traditional Doppler beam swinging (DBS) or velocity–azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused bymore » VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.« less
Evaluation of three lidar scanning strategies for turbulence measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newman, Jennifer F.; Klein, Petra M.; Wharton, Sonia
Several errors occur when a traditional Doppler beam swinging (DBS) or velocity–azimuth display (VAD) strategy is used to measure turbulence with a lidar. To mitigate some of these errors, a scanning strategy was recently developed which employs six beam positions to independently estimate the u, v, and w velocity variances and covariances. In order to assess the ability of these different scanning techniques to measure turbulence, a Halo scanning lidar, WindCube v2 pulsed lidar, and ZephIR continuous wave lidar were deployed at field sites in Oklahoma and Colorado with collocated sonic anemometers.Results indicate that the six-beam strategy mitigates some of the errors caused bymore » VAD and DBS scans, but the strategy is strongly affected by errors in the variance measured at the different beam positions. The ZephIR and WindCube lidars overestimated horizontal variance values by over 60 % under unstable conditions as a result of variance contamination, where additional variance components contaminate the true value of the variance. A correction method was developed for the WindCube lidar that uses variance calculated from the vertical beam position to reduce variance contamination in the u and v variance components. The correction method reduced WindCube variance estimates by over 20 % at both the Oklahoma and Colorado sites under unstable conditions, when variance contamination is largest. This correction method can be easily applied to other lidars that contain a vertical beam position and is a promising method for accurately estimating turbulence with commercially available lidars.« less
Monthly hydroclimatology of the continental United States
NASA Astrophysics Data System (ADS)
Petersen, Thomas; Devineni, Naresh; Sankarasubramanian, A.
2018-04-01
Physical/semi-empirical models that do not require any calibration are of paramount need for estimating hydrological fluxes for ungauged sites. We develop semi-empirical models for estimating the mean and variance of the monthly streamflow based on Taylor Series approximation of a lumped physically based water balance model. The proposed models require mean and variance of monthly precipitation and potential evapotranspiration, co-variability of precipitation and potential evapotranspiration and regionally calibrated catchment retention sensitivity, atmospheric moisture uptake sensitivity, groundwater-partitioning factor, and the maximum soil moisture holding capacity parameters. Estimates of mean and variance of monthly streamflow using the semi-empirical equations are compared with the observed estimates for 1373 catchments in the continental United States. Analyses show that the proposed models explain the spatial variability in monthly moments for basins in lower elevations. A regionalization of parameters for each water resources region show good agreement between observed moments and model estimated moments during January, February, March and April for mean and all months except May and June for variance. Thus, the proposed relationships could be employed for understanding and estimating the monthly hydroclimatology of ungauged basins using regional parameters.
Impact of Damping Uncertainty on SEA Model Response Variance
NASA Technical Reports Server (NTRS)
Schiller, Noah; Cabell, Randolph; Grosveld, Ferdinand
2010-01-01
Statistical Energy Analysis (SEA) is commonly used to predict high-frequency vibroacoustic levels. This statistical approach provides the mean response over an ensemble of random subsystems that share the same gross system properties such as density, size, and damping. Recently, techniques have been developed to predict the ensemble variance as well as the mean response. However these techniques do not account for uncertainties in the system properties. In the present paper uncertainty in the damping loss factor is propagated through SEA to obtain more realistic prediction bounds that account for both ensemble and damping variance. The analysis is performed on a floor-equipped cylindrical test article that resembles an aircraft fuselage. Realistic bounds on the damping loss factor are determined from measurements acquired on the sidewall of the test article. The analysis demonstrates that uncertainties in damping have the potential to significantly impact the mean and variance of the predicted response.
Utility functions predict variance and skewness risk preferences in monkeys
Genest, Wilfried; Stauffer, William R.; Schultz, Wolfram
2016-01-01
Utility is the fundamental variable thought to underlie economic choices. In particular, utility functions are believed to reflect preferences toward risk, a key decision variable in many real-life situations. To assess the validity of utility representations, it is therefore important to examine risk preferences. In turn, this approach requires formal definitions of risk. A standard approach is to focus on the variance of reward distributions (variance-risk). In this study, we also examined a form of risk related to the skewness of reward distributions (skewness-risk). Thus, we tested the extent to which empirically derived utility functions predicted preferences for variance-risk and skewness-risk in macaques. The expected utilities calculated for various symmetrical and skewed gambles served to define formally the direction of stochastic dominance between gambles. In direct choices, the animals’ preferences followed both second-order (variance) and third-order (skewness) stochastic dominance. Specifically, for gambles with different variance but identical expected values (EVs), the monkeys preferred high-variance gambles at low EVs and low-variance gambles at high EVs; in gambles with different skewness but identical EVs and variances, the animals preferred positively over symmetrical and negatively skewed gambles in a strongly transitive fashion. Thus, the utility functions predicted the animals’ preferences for variance-risk and skewness-risk. Using these well-defined forms of risk, this study shows that monkeys’ choices conform to the internal reward valuations suggested by their utility functions. This result implies a representation of utility in monkeys that accounts for both variance-risk and skewness-risk preferences. PMID:27402743
Utility functions predict variance and skewness risk preferences in monkeys.
Genest, Wilfried; Stauffer, William R; Schultz, Wolfram
2016-07-26
Utility is the fundamental variable thought to underlie economic choices. In particular, utility functions are believed to reflect preferences toward risk, a key decision variable in many real-life situations. To assess the validity of utility representations, it is therefore important to examine risk preferences. In turn, this approach requires formal definitions of risk. A standard approach is to focus on the variance of reward distributions (variance-risk). In this study, we also examined a form of risk related to the skewness of reward distributions (skewness-risk). Thus, we tested the extent to which empirically derived utility functions predicted preferences for variance-risk and skewness-risk in macaques. The expected utilities calculated for various symmetrical and skewed gambles served to define formally the direction of stochastic dominance between gambles. In direct choices, the animals' preferences followed both second-order (variance) and third-order (skewness) stochastic dominance. Specifically, for gambles with different variance but identical expected values (EVs), the monkeys preferred high-variance gambles at low EVs and low-variance gambles at high EVs; in gambles with different skewness but identical EVs and variances, the animals preferred positively over symmetrical and negatively skewed gambles in a strongly transitive fashion. Thus, the utility functions predicted the animals' preferences for variance-risk and skewness-risk. Using these well-defined forms of risk, this study shows that monkeys' choices conform to the internal reward valuations suggested by their utility functions. This result implies a representation of utility in monkeys that accounts for both variance-risk and skewness-risk preferences.
Detection of gene-environment interaction in pedigree data using genome-wide genotypes.
Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V
2016-12-01
Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits.
Tiezzi, F; de Los Campos, G; Parker Gaddis, K L; Maltecca, C
2017-03-01
Genotype by environment interaction (G × E) in dairy cattle productive traits has been shown to exist, but current genetic evaluation methods do not take this component into account. As several environmental descriptors (e.g., climate, farming system) are known to vary within the United States, not accounting for the G × E could lead to reranking of bulls and loss in genetic gain. Using test-day records on milk yield, somatic cell score, fat, and protein percentage from all over the United States, we computed within herd-year-season daughter yield deviations for 1,087 Holstein bulls and regressed them on genetic and environmental information to estimate variance components and to assess prediction accuracy. Genomic information was obtained from a 50k SNP marker panel. Environmental effect inputs included herd (160 levels), geographical region (7 levels), geographical location (2 variables), climate information (7 variables), and management conditions of the herds (16 total variables divided in 4 subgroups). For each set of environmental descriptors, environmental, genomic, and G × E components were sequentially fitted. Variance components estimates confirmed the presence of G × E on milk yield, with its effect being larger than main genetic effect and the environmental effect for some models. Conversely, G × E was moderate for somatic cell score and small for milk composition. Genotype by environment interaction, when included, partially eroded the genomic effect (as compared with the models where G × E was not included), suggesting that the genomic variance could at least in part be attributed to G × E not appropriately accounted for. Model predictive ability was assessed using 3 cross-validation schemes (new bulls, incomplete progeny test, and new environmental conditions), and performance was compared with a reference model including only the main genomic effect. In each scenario, at least 1 of the models including G × E was able to perform better than the reference model, although it was not possible to find the overall best-performing model that included the same set of environmental descriptors. In general, the methodology used is promising in accounting for G × E in genomic predictions, but challenges exist in identifying a unique set of covariates capable of describing the entire variety of environments. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Hakkenberg, C R; Zhu, K; Peet, R K; Song, C
2018-02-01
The central role of floristic diversity in maintaining habitat integrity and ecosystem function has propelled efforts to map and monitor its distribution across forest landscapes. While biodiversity studies have traditionally relied largely on ground-based observations, the immensity of the task of generating accurate, repeatable, and spatially-continuous data on biodiversity patterns at large scales has stimulated the development of remote-sensing methods for scaling up from field plot measurements. One such approach is through integrated LiDAR and hyperspectral remote-sensing. However, despite their efficiencies in cost and effort, LiDAR-hyperspectral sensors are still highly constrained in structurally- and taxonomically-heterogeneous forests - especially when species' cover is smaller than the image resolution, intertwined with neighboring taxa, or otherwise obscured by overlapping canopy strata. In light of these challenges, this study goes beyond the remote characterization of upper canopy diversity to instead model total vascular plant species richness in a continuous-cover North Carolina Piedmont forest landscape. We focus on two related, but parallel, tasks. First, we demonstrate an application of predictive biodiversity mapping, using nonparametric models trained with spatially-nested field plots and aerial LiDAR-hyperspectral data, to predict spatially-explicit landscape patterns in floristic diversity across seven spatial scales between 0.01-900 m 2 . Second, we employ bivariate parametric models to test the significance of individual, remotely-sensed predictors of plant richness to determine how parameter estimates vary with scale. Cross-validated results indicate that predictive models were able to account for 15-70% of variance in plant richness, with LiDAR-derived estimates of topography and forest structural complexity, as well as spectral variance in hyperspectral imagery explaining the largest portion of variance in diversity levels. Importantly, bivariate tests provide evidence of scale-dependence among predictors, such that remotely-sensed variables significantly predict plant richness only at spatial scales that sufficiently subsume geolocational imprecision between remotely-sensed and field data, and best align with stand components including plant size and density, as well as canopy gaps and understory growth patterns. Beyond their insights into the scale-dependent patterns and drivers of plant diversity in Piedmont forests, these results highlight the potential of remotely-sensible essential biodiversity variables for mapping and monitoring landscape floristic diversity from air- and space-borne platforms. © 2017 by the Ecological Society of America.
NASA Astrophysics Data System (ADS)
Asanuma, Jun
Variances of the velocity components and scalars are important as indicators of the turbulence intensity. They also can be utilized to estimate surface fluxes in several types of "variance methods", and the estimated fluxes can be regional values if the variances from which they are calculated are regionally representative measurements. On these motivations, variances measured by an aircraft in the unstable ABL over a flat pine forest during HAPEX-Mobilhy were analyzed within the context of the similarity scaling arguments. The variances of temperature and vertical velocity within the atmospheric surface layer were found to follow closely the Monin-Obukhov similarity theory, and to yield reasonable estimates of the surface sensible heat fluxes when they are used in variance methods. This gives a validation to the variance methods with aircraft measurements. On the other hand, the specific humidity variances were influenced by the surface heterogeneity and clearly fail to obey MOS. A simple analysis based on the similarity law for free convection produced a comprehensible and quantitative picture regarding the effect of the surface flux heterogeneity on the statistical moments, and revealed that variances of the active and passive scalars become dissimilar because of their different roles in turbulence. The analysis also indicated that the mean quantities are also affected by the heterogeneity but to a less extent than the variances. The temperature variances in the mixed layer (ML) were examined by using a generalized top-down bottom-up diffusion model with some combinations of velocity scales and inversion flux models. The results showed that the surface shear stress exerts considerable influence on the lower ML. Also with the temperature and vertical velocity variances ML variance methods were tested, and their feasibility was investigated. Finally, the variances in the ML were analyzed in terms of the local similarity concept; the results confirmed the original hypothesis by Panofsky and McCormick that the local scaling in terms of the local buoyancy flux defines the lower bound of the moments.
Liu, Xian; Engel, Charles C
2012-12-20
Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities. Copyright © 2012 John Wiley & Sons, Ltd.
Rice, Mabel L; Zubrick, Stephen R; Taylor, Catherine L; Gayán, Javier; Bontempo, Daniel E
2014-06-01
This study investigated the etiology of late language emergence (LLE) in 24-month-old twins, considering possible twinning, zygosity, gender, and heritability effects for vocabulary and grammar phenotypes. A population-based sample of 473 twin pairs participated. Multilevel modeling estimated means and variances of vocabulary and grammar phenotypes, controlling for familiality. Heritability was estimated with DeFries-Fulker regression and variance components models to determine effects of heritability, shared environment, and nonshared environment. Twins had lower average language scores than norms for single-born children, with lower average performance for monozygotic than dizygotic twins and for boys than girls, although gender and zygosity did not interact. Gender did not predict LLE. Significant heritability was detected for vocabulary (0.26) and grammar phenotypes (0.52 and 0.43 for boys and girls, respectively) in the full sample and in the sample selected for LLE (0.42 and 0.44). LLE and the appearance of Word Combinations were also significantly heritable (0.22-0.23). The findings revealed an increased likelihood of LLE in twin toddlers compared with single-born children that is modulated by zygosity and gender differences. Heritability estimates are consistent with previous research for vocabulary and add further suggestion of heritable differences in early grammar acquisition.
Tang, Yongqiang
2017-12-01
Control-based pattern mixture models (PMM) and delta-adjusted PMMs are commonly used as sensitivity analyses in clinical trials with non-ignorable dropout. These PMMs assume that the statistical behavior of outcomes varies by pattern in the experimental arm in the imputation procedure, but the imputed data are typically analyzed by a standard method such as the primary analysis model. In the multiple imputation (MI) inference, Rubin's variance estimator is generally biased when the imputation and analysis models are uncongenial. One objective of the article is to quantify the bias of Rubin's variance estimator in the control-based and delta-adjusted PMMs for longitudinal continuous outcomes. These PMMs assume the same observed data distribution as the mixed effects model for repeated measures (MMRM). We derive analytic expressions for the MI treatment effect estimator and the associated Rubin's variance in these PMMs and MMRM as functions of the maximum likelihood estimator from the MMRM analysis and the observed proportion of subjects in each dropout pattern when the number of imputations is infinite. The asymptotic bias is generally small or negligible in the delta-adjusted PMM, but can be sizable in the control-based PMM. This indicates that the inference based on Rubin's rule is approximately valid in the delta-adjusted PMM. A simple variance estimator is proposed to ensure asymptotically valid MI inferences in these PMMs, and compared with the bootstrap variance. The proposed method is illustrated by the analysis of an antidepressant trial, and its performance is further evaluated via a simulation study. © 2017, The International Biometric Society.
Ask, Elklit; Gudmundsdottir, Drifa
2014-01-01
This is a follow up study on rescue workers participating in the primary rescue during and immediately after the explosion of a firework factory. We aimed to estimate the possible PTSD prevalence at five and 18 months post disaster, determining if the level of PTSD symptoms at 18 months could be predicted from factors measured at five months. We included measures of posttraumatic symptoms, social support, locus of control and demographic questions. The possible PTSD prevalence rose from 1.6% (n = 465) at five months post disaster to 3.1% (n = 130) at 18 months. A hierarchical linear regression predicted 59% of PTSD symptoms variance at 18 months post disaster. In the final regression, somatization explained the greatest part of the symptom variance (42%), followed by locus of control (29%) and major life events prior to and right after the disaster (23%). Rescue workers seemed to be relatively robust to traumatic exposure: The prevalence of possible PTSD in our study was even lower than previous studies, probably because of the less severe consequences of the disaster studied. Furthermore, we found that PTSD symptom level at 18 months post disaster was highly predicted by psychological factors, particularly by somatization. However, further investigations of traumatic responding are required in this population.
Ozay, Guner; Seyhan, Ferda; Yilmaz, Aysun; Whitaker, Thomas B; Slate, Andrew B; Giesbrecht, Francis
2006-01-01
The variability associated with the aflatoxin test procedure used to estimate aflatoxin levels in bulk shipments of hazelnuts was investigated. Sixteen 10 kg samples of shelled hazelnuts were taken from each of 20 lots that were suspected of aflatoxin contamination. The total variance associated with testing shelled hazelnuts was estimated and partitioned into sampling, sample preparation, and analytical variance components. Each variance component increased as aflatoxin concentration (either B1 or total) increased. With the use of regression analysis, mathematical expressions were developed to model the relationship between aflatoxin concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific aflatoxin concentration. The sampling, sample preparation, and analytical variances associated with estimating aflatoxin in a hazelnut lot at a total aflatoxin level of 10 ng/g and using a 10 kg sample, a 50 g subsample, dry comminution with a Robot Coupe mill, and a high-performance liquid chromatographic analytical method are 174.40, 0.74, and 0.27, respectively. The sampling, sample preparation, and analytical steps of the aflatoxin test procedure accounted for 99.4, 0.4, and 0.2% of the total variability, respectively.
NASA Technical Reports Server (NTRS)
Wunsch, Carl; Stammer, Detlef
1995-01-01
Two years of altimetric data from the TOPEX/POSEIDON spacecraft have been used to produce preliminary estimates of the space and time spectra of global variability for both sea surface height and slope. The results are expressed in terms of both degree variances from spherical harmonic expansions and in along-track wavenumbers. Simple analytic approximations both in terms of piece-wise power laws and Pade fractions are provided for comparison with independent measurements and for easy use of the results. A number of uses of such spectra exist, including the possibility of combining the altimetric data with other observations, predictions of spatial coherences, and the estimation of the accuracy of apparent secular trends in sea level.
Uncertainty importance analysis using parametric moment ratio functions.
Wei, Pengfei; Lu, Zhenzhou; Song, Jingwen
2014-02-01
This article presents a new importance analysis framework, called parametric moment ratio function, for measuring the reduction of model output uncertainty when the distribution parameters of inputs are changed, and the emphasis is put on the mean and variance ratio functions with respect to the variances of model inputs. The proposed concepts efficiently guide the analyst to achieve a targeted reduction on the model output mean and variance by operating on the variances of model inputs. The unbiased and progressive unbiased Monte Carlo estimators are also derived for the parametric mean and variance ratio functions, respectively. Only a set of samples is needed for implementing the proposed importance analysis by the proposed estimators, thus the computational cost is free of input dimensionality. An analytical test example with highly nonlinear behavior is introduced for illustrating the engineering significance of the proposed importance analysis technique and verifying the efficiency and convergence of the derived Monte Carlo estimators. Finally, the moment ratio function is applied to a planar 10-bar structure for achieving a targeted 50% reduction of the model output variance. © 2013 Society for Risk Analysis.
Jongerling, Joran; Laurenceau, Jean-Philippe; Hamaker, Ellen L
2015-01-01
In this article we consider a multilevel first-order autoregressive [AR(1)] model with random intercepts, random autoregression, and random innovation variance (i.e., the level 1 residual variance). Including random innovation variance is an important extension of the multilevel AR(1) model for two reasons. First, between-person differences in innovation variance are important from a substantive point of view, in that they capture differences in sensitivity and/or exposure to unmeasured internal and external factors that influence the process. Second, using simulation methods we show that modeling the innovation variance as fixed across individuals, when it should be modeled as a random effect, leads to biased parameter estimates. Additionally, we use simulation methods to compare maximum likelihood estimation to Bayesian estimation of the multilevel AR(1) model and investigate the trade-off between the number of individuals and the number of time points. We provide an empirical illustration by applying the extended multilevel AR(1) model to daily positive affect ratings from 89 married women over the course of 42 consecutive days.
Hayashi, Hideaki; Nakamura, Go; Chin, Takaaki; Tsuji, Toshio
2017-01-01
This paper proposes an artificial electromyogram (EMG) signal generation model based on signal-dependent noise, which has been ignored in existing methods, by introducing the stochastic construction of the EMG signals. In the proposed model, an EMG signal variance value is first generated from a probability distribution with a shape determined by a commanded muscle force and signal-dependent noise. Artificial EMG signals are then generated from the associated Gaussian distribution with a zero mean and the generated variance. This facilitates representation of artificial EMG signals with signal-dependent noise superimposed according to the muscle activation levels. The frequency characteristics of the EMG signals are also simulated via a shaping filter with parameters determined by an autoregressive model. An estimation method to determine EMG variance distribution using rectified and smoothed EMG signals, thereby allowing model parameter estimation with a small number of samples, is also incorporated in the proposed model. Moreover, the prediction of variance distribution with strong muscle contraction from EMG signals with low muscle contraction and related artificial EMG generation are also described. The results of experiments conducted, in which the reproduction capability of the proposed model was evaluated through comparison with measured EMG signals in terms of amplitude, frequency content, and EMG distribution demonstrate that the proposed model can reproduce the features of measured EMG signals. Further, utilizing the generated EMG signals as training data for a neural network resulted in the classification of upper limb motion with a higher precision than by learning from only measured EMG signals. This indicates that the proposed model is also applicable to motion classification. PMID:28640883
Olivoto, T; Nardino, M; Carvalho, I R; Follmann, D N; Ferrari, M; Szareski, V J; de Pelegrin, A J; de Souza, V Q
2017-03-22
Methodologies using restricted maximum likelihood/best linear unbiased prediction (REML/BLUP) in combination with sequential path analysis in maize are still limited in the literature. Therefore, the aims of this study were: i) to use REML/BLUP-based procedures in order to estimate variance components, genetic parameters, and genotypic values of simple maize hybrids, and ii) to fit stepwise regressions considering genotypic values to form a path diagram with multi-order predictors and minimum multicollinearity that explains the relationships of cause and effect among grain yield-related traits. Fifteen commercial simple maize hybrids were evaluated in multi-environment trials in a randomized complete block design with four replications. The environmental variance (78.80%) and genotype-vs-environment variance (20.83%) accounted for more than 99% of the phenotypic variance of grain yield, which difficult the direct selection of breeders for this trait. The sequential path analysis model allowed the selection of traits with high explanatory power and minimum multicollinearity, resulting in models with elevated fit (R 2 > 0.9 and ε < 0.3). The number of kernels per ear (NKE) and thousand-kernel weight (TKW) are the traits with the largest direct effects on grain yield (r = 0.66 and 0.73, respectively). The high accuracy of selection (0.86 and 0.89) associated with the high heritability of the average (0.732 and 0.794) for NKE and TKW, respectively, indicated good reliability and prospects of success in the indirect selection of hybrids with high-yield potential through these traits. The negative direct effect of NKE on TKW (r = -0.856), however, must be considered. The joint use of mixed models and sequential path analysis is effective in the evaluation of maize-breeding trials.
The Cohesive Population Genetics of Molecular Drive
Ohta, Tomoko; Dover, Gabriel A.
1984-01-01
The long-term population genetics of multigene families is influenced by several biased and unbiased mechanisms of nonreciprocal exchanges (gene conversion, unequal exchanges, transposition) between member genes, often distributed on several chromosomes. These mechanisms cause fluctuations in the copy number of variant genes in an individual and lead to a gradual replacement of an original family of n genes (A) in N number of individuals by a variant gene (a). The process for spreading a variant gene through a family and through a population is called molecular drive. Consideration of the known slow rates of nonreciprocal exchanges predicts that the population variance in the copy number of gene a per individual is small at any given generation during molecular drive. Genotypes at a given generation are expected only to range over a small section of all possible genotypes from one extreme (n number of A) to the other (n number of a). A theory is developed for estimating the size of the population variance by using the concept of identity coefficients. In particular, the variance in the course of spreading of a single mutant gene of a multigene family was investigated in detail, and the theory of identity coefficients at the state of steady decay of genetic variability proved to be useful. Monte Carlo simulations and numerical analysis based on realistic rates of exchange in families of known size reveal the correctness of the theoretical prediction and also assess the effect of bias in turnover. The population dynamics of molecular drive in gradually increasing the mean copy number of a variant gene without the generation of a large variance (population cohesion) is of significance regarding potential interactions between natural selection and molecular drive. PMID:6500260
The cohesive population genetics of molecular drive.
Ohta, T; Dover, G A
1984-10-01
The long-term population genetics of multigene families is influenced by several biased and unbiased mechanisms of nonreciprocal exchanges (gene conversion, unequal exchanges, transposition) between member genes, often distributed on several chromosomes. These mechanisms cause fluctuations in the copy number of variant genes in an individual and lead to a gradual replacement of an original family of n genes (A) in N number of individuals by a variant gene (a). The process for spreading a variant gene through a family and through a population is called molecular drive. Consideration of the known slow rates of nonreciprocal exchanges predicts that the population variance in the copy number of gene a per individual is small at any given generation during molecular drive. Genotypes at a given generation are expected only to range over a small section of all possible genotypes from one extreme (n number of A) to the other (n number of a). A theory is developed for estimating the size of the population variance by using the concept of identity coefficients. In particular, the variance in the course of spreading of a single mutant gene of a multigene family was investigated in detail, and the theory of identity coefficients at the state of steady decay of genetic variability proved to be useful. Monte Carlo simulations and numerical analysis based on realistic rates of exchange in families of known size reveal the correctness of the theoretical prediction and also assess the effect of bias in turnover. The population dynamics of molecular drive in gradually increasing the mean copy number of a variant gene without the generation of a large variance (population cohesion) is of significance regarding potential interactions between natural selection and molecular drive.
Generalized Polynomial Chaos Based Uncertainty Quantification for Planning MRgLITT Procedures
Fahrenholtz, S.; Stafford, R. J.; Maier, F.; Hazle, J. D.; Fuentes, D.
2014-01-01
Purpose A generalized polynomial chaos (gPC) method is used to incorporate constitutive parameter uncertainties within the Pennes representation of bioheat transfer phenomena. The stochastic temperature predictions of the mathematical model are critically evaluated against MR thermometry data for planning MR-guided Laser Induced Thermal Therapies (MRgLITT). Methods Pennes bioheat transfer model coupled with a diffusion theory approximation of laser tissue interaction was implemented as the underlying deterministic kernel. A probabilistic sensitivity study was used to identify parameters that provide the most variance in temperature output. Confidence intervals of the temperature predictions are compared to MR temperature imaging (MRTI) obtained during phantom and in vivo canine (n=4) MRgLITT experiments. The gPC predictions were quantitatively compared to MRTI data using probabilistic linear and temporal profiles as well as 2-D 60 °C isotherms. Results Within the range of physically meaningful constitutive values relevant to the ablative temperature regime of MRgLITT, the sensitivity study indicated that the optical parameters, particularly the anisotropy factor, created the most variance in the stochastic model's output temperature prediction. Further, within the statistical sense considered, a nonlinear model of the temperature and damage dependent perfusion, absorption, and scattering is captured within the confidence intervals of the linear gPC method. Multivariate stochastic model predictions using parameters with the dominant sensitivities show good agreement with experimental MRTI data. Conclusions Given parameter uncertainties and mathematical modeling approximations of the Pennes bioheat model, the statistical framework demonstrates conservative estimates of the therapeutic heating and has potential for use as a computational prediction tool for thermal therapy planning. PMID:23692295
Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver
2015-01-01
The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
Asymptotic Effect of Misspecification in the Random Part of the Multilevel Model
ERIC Educational Resources Information Center
Berkhof, Johannes; Kampen, Jarl Kennard
2004-01-01
The authors examine the asymptotic effect of omitting a random coefficient in the multilevel model and derive expressions for the change in (a) the variance components estimator and (b) the estimated variance of the fixed effects estimator. They apply the method of moments, which yields a closed form expression for the omission effect. In…
Sampling in freshwater environments: suspended particle traps and variability in the final data.
Barbizzi, Sabrina; Pati, Alessandra
2008-11-01
This paper reports one practical method to estimate the measurement uncertainty including sampling, derived by the approach implemented by Ramsey for soil investigations. The methodology has been applied to estimate the measurements uncertainty (sampling and analyses) of (137)Cs activity concentration (Bq kg(-1)) and total carbon content (%) in suspended particle sampling in a freshwater ecosystem. Uncertainty estimates for between locations, sampling and analysis components have been evaluated. For the considered measurands, the relative expanded measurement uncertainties are 12.3% for (137)Cs and 4.5% for total carbon. For (137)Cs, the measurement (sampling+analysis) variance gives the major contribution to the total variance, while for total carbon the spatial variance is the dominant contributor to the total variance. The limitations and advantages of this basic method are discussed.
Statistical modelling of thermal annealing of fission tracks in apatite
NASA Astrophysics Data System (ADS)
Laslett, G. M.; Galbraith, R. F.
1996-12-01
We develop an improved methodology for modelling the relationship between mean track length, temperature, and time in fission track annealing experiments. We consider "fanning Arrhenius" models, in which contours of constant mean length on an Arrhenius plot are straight lines meeting at a common point. Features of our approach are explicit use of subject matter knowledge, treating mean length as the response variable, modelling of the mean-variance relationship with two components of variance, improved modelling of the control sample, and using information from experiments in which no tracks are seen. This approach overcomes several weaknesses in previous models and provides a robust six parameter model that is widely applicable. Estimation is via direct maximum likelihood which can be implemented using a standard numerical optimisation package. Because the model is highly nonlinear, some reparameterisations are needed to achieve stable estimation and calculation of precisions. Experience suggests that precisions are more convincingly estimated from profile log-likelihood functions than from the information matrix. We apply our method to the B-5 and Sr fluorapatite data of Crowley et al. (1991) and obtain well-fitting models in both cases. For the B-5 fluorapatite, our model exhibits less fanning than that of Crowley et al. (1991), although fitted mean values above 12 μm are fairly similar. However, predictions can be different, particularly for heavy annealing at geological time scales, where our model is less retentive. In addition, the refined error structure of our model results in tighter prediction errors, and has components of error that are easier to verify or modify. For the Sr fluorapatite, our fitted model for mean lengths does not differ greatly from that of Crowley et al. (1991), but our error structure is quite different.
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data
Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark
2010-01-01
Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
Statistical modelling of growth using a mixed model with orthogonal polynomials.
Suchocki, T; Szyda, J
2011-02-01
In statistical modelling, the effects of single-nucleotide polymorphisms (SNPs) are often regarded as time-independent. However, for traits recorded repeatedly, it is very interesting to investigate the behaviour of gene effects over time. In the analysis, simulated data from the 13th QTL-MAS Workshop (Wageningen, The Netherlands, April 2009) was used and the major goal was the modelling of genetic effects as time-dependent. For this purpose, a mixed model which describes each effect using the third-order Legendre orthogonal polynomials, in order to account for the correlation between consecutive measurements, is fitted. In this model, SNPs are modelled as fixed, while the environment is modelled as random effects. The maximum likelihood estimates of model parameters are obtained by the expectation-maximisation (EM) algorithm and the significance of the additive SNP effects is based on the likelihood ratio test, with p-values corrected for multiple testing. For each significant SNP, the percentage of the total variance contributed by this SNP is calculated. Moreover, by using a model which simultaneously incorporates effects of all of the SNPs, the prediction of future yields is conducted. As a result, 179 from the total of 453 SNPs covering 16 out of 18 true quantitative trait loci (QTL) were selected. The correlation between predicted and true breeding values was 0.73 for the data set with all SNPs and 0.84 for the data set with selected SNPs. In conclusion, we showed that a longitudinal approach allows for estimating changes of the variance contributed by each SNP over time and demonstrated that, for prediction, the pre-selection of SNPs plays an important role.
NASA Astrophysics Data System (ADS)
Oikawa, P. Y.; Jenerette, G. D.; Knox, S. H.; Sturtevant, C.; Verfaillie, J.; Dronova, I.; Poindexter, C. M.; Eichelmann, E.; Baldocchi, D. D.
2017-01-01
Wetlands and flooded peatlands can sequester large amounts of carbon (C) and have high greenhouse gas mitigation potential. There is growing interest in financing wetland restoration using C markets; however, this requires careful accounting of both CO2 and CH4 exchange at the ecosystem scale. Here we present a new model, the PEPRMT model (Peatland Ecosystem Photosynthesis Respiration and Methane Transport), which consists of a hierarchy of biogeochemical models designed to estimate CO2 and CH4 exchange in restored managed wetlands. Empirical models using temperature and/or photosynthesis to predict respiration and CH4 production were contrasted with a more process-based model that simulated substrate-limited respiration and CH4 production using multiple carbon pools. Models were parameterized by using a model-data fusion approach with multiple years of eddy covariance data collected in a recently restored wetland and a mature restored wetland. A third recently restored wetland site was used for model validation. During model validation, the process-based model explained 70% of the variance in net ecosystem exchange of CO2 (NEE) and 50% of the variance in CH4 exchange. Not accounting for high respiration following restoration led to empirical models overestimating annual NEE by 33-51%. By employing a model-data fusion approach we provide rigorous estimates of uncertainty in model predictions, accounting for uncertainty in data, model parameters, and model structure. The PEPRMT model is a valuable tool for understanding carbon cycling in restored wetlands and for application in carbon market-funded wetland restoration, thereby advancing opportunity to counteract the vast degradation of wetlands and flooded peatlands.
NASA Astrophysics Data System (ADS)
Jankovic, Igor; Maghrebi, Mahdi; Fiori, Aldo; Zarlenga, Antonio; Dagan, Gedeon
2017-04-01
We examine the impact of permeability structures on the Breakthrough Curve (BTC) of solute, at a distance x from the injection plane, under mean uniform flow of mean velocity U. The study is carried out through accurate 3D numerical simulations, rather than the 2D models adopted in most of previous works. All structures share the same univariate distribution of the logconductivity Y = lnK and autocorrelation function ρY , but differ in higher order statistics. The main finding is that the BTC of ergodic plumes for the different examined structures is quite robust, displaying a seemingly "universal" behavior. The result is in variance with similar analyses carried out in the past for 2D permeability structures. The basic parameters (i.e. the geometric mean, the logconductivity variance σY 2 and the horizontal integral scale I) have to be identified from field data (e.g. core analysis, pumping test or other methods). However, prediction requires the knowledge of U, and the results suggest that improvement of the BTC prediction in applications can be achieved by independent estimates of the mean velocity U, e.g. by pumping tests, rather than attempting to characterize the permeability structure beyond its second-order characterization. The BTC prediction made by the Inverse Gaussian (IG) distribution, adopting the macrodispersion coefficient estimated by the First Order approximation αL = σY 2I, is also quite robust, providing a simple and effective solution to be employed in applications. The consequences of the latter result are further explored by modeling the mass distribution that occurred at the MADE-1 natural gradient experiment, for which we show that most of the plume features are adequately captured by the simple First Order approach.
Systems Engineering Programmatic Estimation Using Technology Variance
NASA Technical Reports Server (NTRS)
Mog, Robert A.
2000-01-01
Unique and innovative system programmatic estimation is conducted using the variance of the packaged technologies. Covariance analysis is performed on the subsystems and components comprising the system of interest. Technological "return" and "variation" parameters are estimated. These parameters are combined with the model error to arrive at a measure of system development stability. The resulting estimates provide valuable information concerning the potential cost growth of the system under development.
Heat and solute tracers: how do they compare in heterogeneous aquifers?
Irvine, Dylan J; Simmons, Craig T; Werner, Adrian D; Graf, Thomas
2015-04-01
A comparison of groundwater velocity in heterogeneous aquifers estimated from hydraulic methods, heat and solute tracers was made using numerical simulations. Aquifer heterogeneity was described by geostatistical properties of the Borden, Cape Cod, North Bay, and MADE aquifers. Both heat and solute tracers displayed little systematic under- or over-estimation in velocity relative to a hydraulic control. The worst cases were under-estimates of 6.63% for solute and 2.13% for the heat tracer. Both under- and over-estimation of velocity from the heat tracer relative to the solute tracer occurred. Differences between the estimates from the tracer methods increased as the mean velocity decreased, owing to differences in rates of molecular diffusion and thermal conduction. The variance in estimated velocity using all methods increased as the variance in log-hydraulic conductivity (K) and correlation length scales increased. The variance in velocity for each scenario was remarkably small when compared to σ2 ln(K) for all methods tested. The largest variability identified was for the solute tracer where 95% of velocity estimates ranged by a factor of 19 in simulations where 95% of the K values varied by almost four orders of magnitude. For the same K-fields, this range was a factor of 11 for the heat tracer. The variance in estimated velocity was always lowest when using heat as a tracer. The study results suggest that a solute tracer will provide more understanding about the variance in velocity caused by aquifer heterogeneity and a heat tracer provides a better approximation of the mean velocity. © 2013, National Ground Water Association.
Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D
2016-06-15
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Estimation of Additive, Dominance, and Imprinting Genetic Variance Using Genomic Data
Lopes, Marcos S.; Bastiaansen, John W. M.; Janss, Luc; Knol, Egbert F.; Bovenhuis, Henk
2015-01-01
Traditionally, exploration of genetic variance in humans, plants, and livestock species has been limited mostly to the use of additive effects estimated using pedigree data. However, with the development of dense panels of single-nucleotide polymorphisms (SNPs), the exploration of genetic variation of complex traits is moving from quantifying the resemblance between family members to the dissection of genetic variation at individual loci. With SNPs, we were able to quantify the contribution of additive, dominance, and imprinting variance to the total genetic variance by using a SNP regression method. The method was validated in simulated data and applied to three traits (number of teats, backfat, and lifetime daily gain) in three purebred pig populations. In simulated data, the estimates of additive, dominance, and imprinting variance were very close to the simulated values. In real data, dominance effects account for a substantial proportion of the total genetic variance (up to 44%) for these traits in these populations. The contribution of imprinting to the total phenotypic variance of the evaluated traits was relatively small (1–3%). Our results indicate a strong relationship between additive variance explained per chromosome and chromosome length, which has been described previously for other traits in other species. We also show that a similar linear relationship exists for dominance and imprinting variance. These novel results improve our understanding of the genetic architecture of the evaluated traits and shows promise to apply the SNP regression method to other traits and species, including human diseases. PMID:26438289
Concerns about a variance approach to X-ray diffractometric estimation of microfibril angle in wood
Steve P. Verrill; David E. Kretschmann; Victoria L. Herian; Michael C. Wiemann; Harry A. Alden
2011-01-01
In this article, we raise three technical concerns about Evansâ 1999 Appita Journal âvariance approachâ to estimating microfibril angle (MFA). The first concern is associated with the approximation of the variance of an X-ray intensity half-profile by a function of the MFA and the natural variability of the MFA. The second concern is associated with the approximation...
Steve P. Verrill; David E. Kretschmann; Victoria L. Herian; Michael Wiemann; Harry A. Alden
2010-01-01
In this paper we raise three technical concerns about Evansâs 1999 Appita Journal âvariance approachâ to estimating microfibril angle. The first concern is associated with the approximation of the variance of an X-ray intensity half-profile by a function of the microfibril angle and the natural variability of the microfibril angle, S2...
Stress in junior enlisted air force women with and without children.
Hopkins-Chadwick, Denise L; Ryan-Wenger, Nancy
2009-04-01
The objective was to determine if there are differences between young enlisted military women with and without preschool children on role strain, stress, health, and military career aspiration and to identify the best predictors of these variables. The study used a cross-sectional descriptive design of 50 junior Air Force women with preschool children and 50 women without children. There were no differences between women with and without children in role strain, stress, health, and military career aspiration. In all women, higher stress was moderately predictive of higher role strain (39.9% of variance explained) but a poor predictor of career aspiration (3.8% of variance explained). Lower mental health scores were predicted by high stress symptoms (27.9% of variance explained), low military career aspiration (4.1% of variance explained), high role strain (4.0% of variance explained), and being non-White (3.9% of variance explained). Aspiration for a military career was predicted by high perceived availability of military resources (16.8% of variance explained), low family of origin socioeconomic status (4.5% of variance explained), and better mental health status (3.3% of variance explained). Contrary to theoretical expectations, in this sample, motherhood was not a significant variable. Increased role strain, stress, and decreased health as well as decreased military career aspiration were evident in both groups and may have more to do with individual coping skills and other unmeasured resources. More research is needed to determine what nursing interventions are needed to best support both groups of women.
Optimal two-phase sampling design for comparing accuracies of two binary classification rules.
Xu, Huiping; Hui, Siu L; Grannis, Shaun
2014-02-10
In this paper, we consider the design for comparing the performance of two binary classification rules, for example, two record linkage algorithms or two screening tests. Statistical methods are well developed for comparing these accuracy measures when the gold standard is available for every unit in the sample, or in a two-phase study when the gold standard is ascertained only in the second phase in a subsample using a fixed sampling scheme. However, these methods do not attempt to optimize the sampling scheme to minimize the variance of the estimators of interest. In comparing the performance of two classification rules, the parameters of primary interest are the difference in sensitivities, specificities, and positive predictive values. We derived the analytic variance formulas for these parameter estimates and used them to obtain the optimal sampling design. The efficiency of the optimal sampling design is evaluated through an empirical investigation that compares the optimal sampling with simple random sampling and with proportional allocation. Results of the empirical study show that the optimal sampling design is similar for estimating the difference in sensitivities and in specificities, and both achieve a substantial amount of variance reduction with an over-sample of subjects with discordant results and under-sample of subjects with concordant results. A heuristic rule is recommended when there is no prior knowledge of individual sensitivities and specificities, or the prevalence of the true positive findings in the study population. The optimal sampling is applied to a real-world example in record linkage to evaluate the difference in classification accuracy of two matching algorithms. Copyright © 2013 John Wiley & Sons, Ltd.
General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models.
de Villemereuil, Pierre; Schielzeth, Holger; Nakagawa, Shinichi; Morrissey, Michael
2016-11-01
Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population. Copyright © 2016 de Villemereuil et al.
Cowley, Patrick M; Fitzgerald, Sharon; Sottung, Kyle; Swensen, Thomas
2009-05-01
First we tested the reliability of two new field tests of core stability (plank to fatigue test [PFT] and front abdominal power test [FAPT]), as well as established measures of core stability (isokinetic trunk extension and flexion strength [TES and TFS] and work [TEW and TFW]) over 3 days in 8 young men and women (24.0 +/- 3.1 years). The TES, TFS, TFW, and FAPT were highly reliable, TEW was moderately reliable, and PFT were unreliable for use during a single testing session. Next, we determined if age, weight, and the data from the reliable field test (FAPT) were predictive of TES, TEW, TFS, and TFW in 50 young men and women (19.0 +/- 1.2 years). The FAPT was the only significant predictor of TES and TEW in young women, explaining 16 and 15% of the variance in trunk performance, respectively. Weight was the only significant predictor of TFS and TFW in young women, explaining 28 and 14% of the variance in trunk performance, respectively. In young men, weight was the only significant predictor of TES, TEW, TFS, and TFW, and explained 27, 35, 42, and 33%, respectively, of the variance in trunk performance. In conclusion, the ability of weight and the FAPT to predict TES, TEW, TFS, and TFW was more frequent in young men than women. Additionally, because the FAPT requires few pieces of equipment, is fast to administer, and predicts isokinetic TES and TEW in young women, it can be used to provide a field-based estimate of isokinetic TES and TEW in women without history of back or lower-extremity injury.
McManus, I C; Dewberry, Chris; Nicholson, Sandra; Dowell, Jonathan S; Woolf, Katherine; Potts, Henry W W
2013-11-14
Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities. Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation. Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs/O-levels. Educational attainment has strong CLPVs for undergraduate and postgraduate performance, accounting for perhaps 65% of true variance in first year performance. Such CLPVs justify the use of educational attainment measure in selection, but also raise a key theoretical question concerning the remaining 35% of variance (and measurement error, range restriction and right-censorship have been taken into account). Just as in astrophysics, 'dark matter' and 'dark energy' are posited to balance various theoretical equations, so medical student selection must also have its 'dark variance', whose nature is not yet properly characterized, but explains a third of the variation in performance during training. Some variance probably relates to factors which are unpredictable at selection, such as illness or other life events, but some is probably also associated with factors such as personality, motivation or study skills.
Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders
2006-03-13
Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.
Smoothed Spectra, Ogives, and Error Estimates for Atmospheric Turbulence Data
NASA Astrophysics Data System (ADS)
Dias, Nelson Luís
2018-01-01
A systematic evaluation is conducted of the smoothed spectrum, which is a spectral estimate obtained by averaging over a window of contiguous frequencies. The technique is extended to the ogive, as well as to the cross-spectrum. It is shown that, combined with existing variance estimates for the periodogram, the variance—and therefore the random error—associated with these estimates can be calculated in a straightforward way. The smoothed spectra and ogives are biased estimates; with simple power-law analytical models, correction procedures are devised, as well as a global constraint that enforces Parseval's identity. Several new results are thus obtained: (1) The analytical variance estimates compare well with the sample variance calculated for the Bartlett spectrum and the variance of the inertial subrange of the cospectrum is shown to be relatively much larger than that of the spectrum. (2) Ogives and spectra estimates with reduced bias are calculated. (3) The bias of the smoothed spectrum and ogive is shown to be negligible at the higher frequencies. (4) The ogives and spectra thus calculated have better frequency resolution than the Bartlett spectrum, with (5) gradually increasing variance and relative error towards the low frequencies. (6) Power-law identification and extraction of the rate of dissipation of turbulence kinetic energy are possible directly from the ogive. (7) The smoothed cross-spectrum is a valid inner product and therefore an acceptable candidate for coherence and spectral correlation coefficient estimation by means of the Cauchy-Schwarz inequality. The quadrature, phase function, coherence function and spectral correlation function obtained from the smoothed spectral estimates compare well with the classical ones derived from the Bartlett spectrum.
Asymptotic Analysis Of The Total Least Squares ESPRIT Algorithm'
NASA Astrophysics Data System (ADS)
Ottersten, B. E.; Viberg, M.; Kailath, T.
1989-11-01
This paper considers the problem of estimating the parameters of multiple narrowband signals arriving at an array of sensors. Modern approaches to this problem often involve costly procedures for calculating the estimates. The ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques) algorithm was recently proposed as a means for obtaining accurate estimates without requiring a costly search of the parameter space. This method utilizes an array invariance to arrive at a computationally efficient multidimensional estimation procedure. Herein, the asymptotic distribution of the estimation error is derived for the Total Least Squares (TLS) version of ESPRIT. The Cramer-Rao Bound (CRB) for the ESPRIT problem formulation is also derived and found to coincide with the variance of the asymptotic distribution through numerical examples. The method is also compared to least squares ESPRIT and MUSIC as well as to the CRB for a calibrated array. Simulations indicate that the theoretic expressions can be used to accurately predict the performance of the algorithm.
Minimum variance geographic sampling
NASA Technical Reports Server (NTRS)
Terrell, G. R. (Principal Investigator)
1980-01-01
Resource inventories require samples with geographical scatter, sometimes not as widely spaced as would be hoped. A simple model of correlation over distances is used to create a minimum variance unbiased estimate population means. The fitting procedure is illustrated from data used to estimate Missouri corn acreage.
Multiple Damage Progression Paths in Model-Based Prognostics
NASA Technical Reports Server (NTRS)
Daigle, Matthew; Goebel, Kai Frank
2011-01-01
Model-based prognostics approaches employ domain knowledge about a system, its components, and how they fail through the use of physics-based models. Component wear is driven by several different degradation phenomena, each resulting in their own damage progression path, overlapping to contribute to the overall degradation of the component. We develop a model-based prognostics methodology using particle filters, in which the problem of characterizing multiple damage progression paths is cast as a joint state-parameter estimation problem. The estimate is represented as a probability distribution, allowing the prediction of end of life and remaining useful life within a probabilistic framework that supports uncertainty management. We also develop a novel variance control mechanism that maintains an uncertainty bound around the hidden parameters to limit the amount of estimation uncertainty and, consequently, reduce prediction uncertainty. We construct a detailed physics-based model of a centrifugal pump, to which we apply our model-based prognostics algorithms. We illustrate the operation of the prognostic solution with a number of simulation-based experiments and demonstrate the performance of the chosen approach when multiple damage mechanisms are active
Use of vegetation health data for estimation of aus rice yield in bangladesh.
Rahman, Atiqur; Roytman, Leonid; Krakauer, Nir Y; Nizamuddin, Mohammad; Goldberg, Mitch
2009-01-01
Rice is a vital staple crop for Bangladesh and surrounding countries, with interannual variation in yields depending on climatic conditions. We compared Bangladesh yield of aus rice, one of the main varieties grown, from official agricultural statistics with Vegetation Health (VH) Indices [Vegetation Condition Index (VCI), Temperature Condition Index (TCI) and Vegetation Health Index (VHI)] computed from Advanced Very High Resolution Radiometer (AVHRR) data covering a period of 15 years (1991-2005). A strong correlation was found between aus rice yield and VCI and VHI during the critical period of aus rice development that occurs during March-April (weeks 8-13 of the year), several months in advance of the rice harvest. Stepwise principal component regression (PCR) was used to construct a model to predict yield as a function of critical-period VHI. The model reduced the yield prediction error variance by 62% compared with a prediction of average yield for each year. Remote sensing is a valuable tool for estimating rice yields well in advance of harvest and at a low cost.
Use of Vegetation Health Data for Estimation of Aus Rice Yield in Bangladesh
Rahman, Atiqur; Roytman, Leonid; Krakauer, Nir Y.; Nizamuddin, Mohammad; Goldberg, Mitch
2009-01-01
Rice is a vital staple crop for Bangladesh and surrounding countries, with interannual variation in yields depending on climatic conditions. We compared Bangladesh yield of aus rice, one of the main varieties grown, from official agricultural statistics with Vegetation Health (VH) Indices [Vegetation Condition Index (VCI), Temperature Condition Index (TCI) and Vegetation Health Index (VHI)] computed from Advanced Very High Resolution Radiometer (AVHRR) data covering a period of 15 years (1991–2005). A strong correlation was found between aus rice yield and VCI and VHI during the critical period of aus rice development that occurs during March–April (weeks 8–13 of the year), several months in advance of the rice harvest. Stepwise principal component regression (PCR) was used to construct a model to predict yield as a function of critical-period VHI. The model reduced the yield prediction error variance by 62% compared with a prediction of average yield for each year. Remote sensing is a valuable tool for estimating rice yields well in advance of harvest and at a low cost. PMID:22574057
Peña, Javier; Segarra, Rafael; Ojeda, Natalia; García, Jon; Eguiluz, José I; Gutiérrez, Miguel
2012-06-01
The aim of this two-year longitudinal study was to identify the best baseline predictors of functional outcome in first-episode psychosis (FEP). We tested whether the same factors predict functional outcomes in two different subsamples of FEP patients: schizophrenia and non-schizophrenia syndrome groups. Ninety-five patients with FEP underwent a full clinical evaluation (i.e., PANSS, Mania, Depression and Insight). Functional outcome measurements included the WHO Disability Assessment Schedule (DAS-WHO), Global Assessment of Functioning (GAF) and Clinical Global Impression (CGI). Estimation of cognition was obtained by a neuropsychological battery which included attention, processing speed, language, memory and executive functioning. Greater severity of visuospatial functioning at baseline predicted poorer functional outcome as measured by the three functional scales (GAF, CGI and DAS-WHO) in the pooled FEP sample (explaining ut to the 12%, 9% and 10% of the variance, respectively). Negative symptoms also effectively contributed to predict GAF scores (8%). However, we obtained different predictive values after differentiating sample diagnoses. Processing speed significantly predicted most functional outcome measures in patients with schizophrenia, whereas visuospatial functioning was the only significant predictor of functional outcomes in the non-schizophrenia subgroup. Our results suggest that processing speed, visuospatial functioning and negative symptoms significantly (but differentially) predict outcomes in patients with FEP, depending on their clinical progression. For patients without a schizophrenia diagnosis, visuospatial functioning was the best predictor of functional outcome. The performance on processing speed seemed to be a key factor in more severe syndromes. However, only a small proportion of the variance could be explained by the model, so there must be many other factors that have to be considered. Copyright © 2012 Elsevier Ltd. All rights reserved.
Harmsen, Wouter J; Ribbers, Gerard M; Slaman, Jorrit; Heijenbrok-Kal, Majanka H; Khajeh, Ladbon; van Kooten, Fop; Neggers, Sebastiaan J C M M; van den Berg-Emons, Rita J
2017-05-01
Peak oxygen uptake (VO 2peak ) established during progressive cardiopulmonary exercise testing (CPET) is the "gold-standard" for cardiorespiratory fitness. However, CPET measurements may be limited in patients with aneurysmal subarachnoid hemorrhage (a-SAH) by disease-related complaints, such as cardiovascular health-risks or anxiety. Furthermore, CPET with gas-exchange analyses require specialized knowledge and infrastructure with limited availability in most rehabilitation facilities. To determine whether an easy-to-administer six-minute walk test (6MWT) is a valid clinical alternative to progressive CPET in order to predict VO 2peak in individuals with a-SAH. Twenty-seven patients performed the 6MWT and CPET with gas-exchange analyses on a cycle ergometer. Univariate and multivariate regression models were made to investigate the predictability of VO 2peak from the six-minute walk distance (6MWD). Univariate regression showed that the 6MWD was strongly related to VO 2peak (r = 0.75, p < 0.001), with an explained variance of 56% and a prediction error of 4.12 ml/kg/min, representing 18% of mean VO 2peak . Adding age and sex to an extended multivariate regression model improved this relationship (r = 0.82, p < 0.001), with an explained variance of 67% and a prediction error of 3.67 ml/kg/min corresponding to 16% of mean VO 2peak . The 6MWT is an easy-to-administer submaximal exercise test that can be selected to estimate cardiorespiratory fitness at an aggregated level, in groups of patients with a-SAH, which may help to evaluate interventions in a clinical or research setting. However, the relatively large prediction error does not allow for an accurate prediction in individual patients.
Yoo, Jinho; Kim, Bo-Hyung; Kim, Soo-Hwan; Kim, Yangseok; Yim, Sung-Vin
2016-05-01
The study aimed to identify single nucleotide polymorphisms (SNPs) that significantly influenced the level of improvement of two kinds of training responses, including maximal O2 uptake (V'O2max) and knee peak torque of healthy adults participating in the high intensity training (HIT) program. The study also aimed to use these SNPs to develop prediction models for individual training responses. 79 Healthy volunteers participated in the HIT program. A genome-wide association study, based on 2,391,739 SNPs, was performed to identify SNPs that were significantly associated with gains in V'O2max and knee peak torque, following 9 weeks of the HIT program. To predict two training responses, two independent SNPs sets were determined using linear regression and iterative binary logistic regression analysis. False discovery rate analysis and permutation tests were performed to avoid false-positive findings. To predict gains in V'O2max, 7 SNPs were identified. These SNPs accounted for 26.0 % of the variance in the increment of V'O2max, and discriminated the subjects into three subgroups, non-responders, medium responders, and high responders, with prediction accuracy of 86.1 %. For the knee peak torque, 6 SNPs were identified, and accounted for 27.5 % of the variance in the increment of knee peak torque. The prediction accuracy discriminating the subjects into the three subgroups was estimated as 77.2 %. Novel SNPs found in this study could explain, and predict inter-individual variability in gains of V'O2max, and knee peak torque. Furthermore, with these genetic markers, a methodology suggested in this study provides a sound approach for the personalized training program.
Detection of gene–environment interaction in pedigree data using genome-wide genotypes
Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V
2016-01-01
Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits. PMID:27436263
Moran, John L; Solomon, Patricia J
2012-05-16
For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, a number of statistical estimators have been proposed as alternatives to the traditional ordinary least squares (OLS) regression with log dependent variable. Using a cohort of patients identified in the Australian and New Zealand Intensive Care Society Adult Patient Database, 2008-2009, 12 different methods were used for estimation of intensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis of firstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal and skew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models [GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial and inverse-Gaussian], extended estimating equations [EEE] and a finite mixture model including a gamma distribution. A fixed covariate list and ICU-site clustering with robust variance were utilised for model fitting with split-sample determination (80%) and validation (20%) data sets, and model simulation was undertaken to establish over-fitting (Copas test). Indices of model specification using Bayesian information criterion [BIC: lower values preferred] and residual analysis as well as predictive performance (R2, concordance correlation coefficient (CCC), mean absolute error [MAE]) were established for each estimator. The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8) years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%. ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated marked kurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from a maximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22 (LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superior residual behaviour was established for the log-scale estimators. There was a general tendency for over-prediction (negative residuals) and for over-fitting, the exception being the GLM negative binomial estimator. The mean-variance function was best approximated by a quadratic function, consistent with log-scale estimation; the link function was estimated (EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function. For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the most consistently performing estimator(s). Neither the GLM variants nor the skew-regression estimators dominated.
Bureau, Alexandre; Duchesne, Thierry
2015-12-01
Splitting extended families into their component nuclear families to apply a genetic association method designed for nuclear families is a widespread practice in familial genetic studies. Dependence among genotypes and phenotypes of nuclear families from the same extended family arises because of genetic linkage of the tested marker with a risk variant or because of familial specificity of genetic effects due to gene-environment interaction. This raises concerns about the validity of inference conducted under the assumption of independence of the nuclear families. We indeed prove theoretically that, in a conditional logistic regression analysis applicable to disease cases and their genotyped parents, the naive model-based estimator of the variance of the coefficient estimates underestimates the true variance. However, simulations with realistic effect sizes of risk variants and variation of this effect from family to family reveal that the underestimation is negligible. The simulations also show the greater efficiency of the model-based variance estimator compared to a robust empirical estimator. Our recommendation is therefore, to use the model-based estimator of variance for inference on effects of genetic variants.
An improved method for bivariate meta-analysis when within-study correlations are unknown.
Hong, Chuan; D Riley, Richard; Chen, Yong
2018-03-01
Multivariate meta-analysis, which jointly analyzes multiple and possibly correlated outcomes in a single analysis, is becoming increasingly popular in recent years. An attractive feature of the multivariate meta-analysis is its ability to account for the dependence between multiple estimates from the same study. However, standard inference procedures for multivariate meta-analysis require the knowledge of within-study correlations, which are usually unavailable. This limits standard inference approaches in practice. Riley et al proposed a working model and an overall synthesis correlation parameter to account for the marginal correlation between outcomes, where the only data needed are those required for a separate univariate random-effects meta-analysis. As within-study correlations are not required, the Riley method is applicable to a wide variety of evidence synthesis situations. However, the standard variance estimator of the Riley method is not entirely correct under many important settings. As a consequence, the coverage of a function of pooled estimates may not reach the nominal level even when the number of studies in the multivariate meta-analysis is large. In this paper, we improve the Riley method by proposing a robust variance estimator, which is asymptotically correct even when the model is misspecified (ie, when the likelihood function is incorrect). Simulation studies of a bivariate meta-analysis, in a variety of settings, show a function of pooled estimates has improved performance when using the proposed robust variance estimator. In terms of individual pooled estimates themselves, the standard variance estimator and robust variance estimator give similar results to the original method, with appropriate coverage. The proposed robust variance estimator performs well when the number of studies is relatively large. Therefore, we recommend the use of the robust method for meta-analyses with a relatively large number of studies (eg, m≥50). When the sample size is relatively small, we recommend the use of the robust method under the working independence assumption. We illustrate the proposed method through 2 meta-analyses. Copyright © 2017 John Wiley & Sons, Ltd.
Robert B. Thomas; Jack Lewis
1993-01-01
Time-stratified sampling of sediment for estimating suspended load is introduced and compared to selection at list time (SALT) sampling. Both methods provide unbiased estimates of load and variance. The magnitude of the variance of the two methods is compared using five storm populations of suspended sediment flux derived from turbidity data. Under like conditions,...
Estimation of the biserial correlation and its sampling variance for use in meta-analysis.
Jacobs, Perke; Viechtbauer, Wolfgang
2017-06-01
Meta-analyses are often used to synthesize the findings of studies examining the correlational relationship between two continuous variables. When only dichotomous measurements are available for one of the two variables, the biserial correlation coefficient can be used to estimate the product-moment correlation between the two underlying continuous variables. Unlike the point-biserial correlation coefficient, biserial correlation coefficients can therefore be integrated with product-moment correlation coefficients in the same meta-analysis. The present article describes the estimation of the biserial correlation coefficient for meta-analytic purposes and reports simulation results comparing different methods for estimating the coefficient's sampling variance. The findings indicate that commonly employed methods yield inconsistent estimates of the sampling variance across a broad range of research situations. In contrast, consistent estimates can be obtained using two methods that appear to be unknown in the meta-analytic literature. A variance-stabilizing transformation for the biserial correlation coefficient is described that allows for the construction of confidence intervals for individual coefficients with close to nominal coverage probabilities in most of the examined conditions. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Liu, Jie; Wang, Wilson; Ma, Fai
2011-07-01
System current state estimation (or condition monitoring) and future state prediction (or failure prognostics) constitute the core elements of condition-based maintenance programs. For complex systems whose internal state variables are either inaccessible to sensors or hard to measure under normal operational conditions, inference has to be made from indirect measurements using approaches such as Bayesian learning. In recent years, the auxiliary particle filter (APF) has gained popularity in Bayesian state estimation; the APF technique, however, has some potential limitations in real-world applications. For example, the diversity of the particles may deteriorate when the process noise is small, and the variance of the importance weights could become extremely large when the likelihood varies dramatically over the prior. To tackle these problems, a regularized auxiliary particle filter (RAPF) is developed in this paper for system state estimation and forecasting. This RAPF aims to improve the performance of the APF through two innovative steps: (1) regularize the approximating empirical density and redraw samples from a continuous distribution so as to diversify the particles; and (2) smooth out the rather diffused proposals by a rejection/resampling approach so as to improve the robustness of particle filtering. The effectiveness of the proposed RAPF technique is evaluated through simulations of a nonlinear/non-Gaussian benchmark model for state estimation. It is also implemented for a real application in the remaining useful life (RUL) prediction of lithium-ion batteries.
Christopher, Micaela E.; Keenan, Janice M.; Hulslander, Jacqueline; DeFries, John C.; Miyake, Akira; Wadsworth, Sally J.; Willcutt, Erik; Pennington, Bruce; Olson, Richard K.
2016-01-01
While previous research has shown cognitive skills to be important predictors of reading ability in children, the respective roles for genetic and environmental influences on these relations is an open question. The present study explored the genetic and environmental etiologies underlying the relations between selected executive functions and cognitive abilities (working memory, inhibition, processing speed, and naming speed) with three components of reading ability (word reading, reading comprehension, and listening comprehension). Twin pairs drawn from the Colorado Front Range (n = 676; 224 monozygotic pairs; 452 dizygotic pairs) between the ages of eight and 16 (M = 11.11) were assessed on multiple measures of each cognitive and reading-related skill. Each cognitive and reading-related skill was modeled as a latent variable, and behavioral genetic analyses estimated the portions of phenotypic variance on each latent variable due to genetic, shared environmental, and nonshared environmental influences. The covariance between the cognitive skills and reading-related skills was driven primarily by genetic influences. The cognitive skills also shared large amounts of genetic variance, as did the reading-related skills. The common cognitive genetic variance was highly correlated with the common reading genetic variance, suggesting that genetic influences involved in general cognitive processing are also important for reading ability. Skill-specific genetic variance in working memory and processing speed also predicted components of reading ability. Taken together, the present study supports a genetic association between children’s cognitive ability and reading ability. PMID:26974208
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.
Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C
2014-01-01
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number
Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.
2014-01-01
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470
A log-sinh transformation for data normalization and variance stabilization
NASA Astrophysics Data System (ADS)
Wang, Q. J.; Shrestha, D. L.; Robertson, D. E.; Pokhrel, P.
2012-05-01
When quantifying model prediction uncertainty, it is statistically convenient to represent model errors that are normally distributed with a constant variance. The Box-Cox transformation is the most widely used technique to normalize data and stabilize variance, but it is not without limitations. In this paper, a log-sinh transformation is derived based on a pattern of errors commonly seen in hydrological model predictions. It is suited to applications where prediction variables are positively skewed and the spread of errors is seen to first increase rapidly, then slowly, and eventually approach a constant as the prediction variable becomes greater. The log-sinh transformation is applied in two case studies, and the results are compared with one- and two-parameter Box-Cox transformations.
Pitchers, W. R.; Brooks, R.; Jennions, M. D.; Tregenza, T.; Dworkin, I.; Hunt, J.
2013-01-01
Phenotypic integration and plasticity are central to our understanding of how complex phenotypic traits evolve. Evolutionary change in complex quantitative traits can be predicted using the multivariate breeders’ equation, but such predictions are only accurate if the matrices involved are stable over evolutionary time. Recent work, however, suggests that these matrices are temporally plastic, spatially variable and themselves evolvable. The data available on phenotypic variance-covariance matrix (P) stability is sparse, and largely focused on morphological traits. Here we compared P for the structure of the complex sexual advertisement call of six divergent allopatric populations of the Australian black field cricket, Teleogryllus commodus. We measured a subset of calls from wild-caught crickets from each of the populations and then a second subset after rearing crickets under common-garden conditions for three generations. In a second experiment, crickets from each population were reared in the laboratory on high- and low-nutrient diets and their calls recorded. In both experiments, we estimated P for call traits and used multiple methods to compare them statistically (Flury hierarchy, geometric subspace comparisons and random skewers). Despite considerable variation in means and variances of individual call traits, the structure of P was largely conserved among populations, across generations and between our rearing diets. Our finding that P remains largely stable, among populations and between environmental conditions, suggests that selection has preserved the structure of call traits in order that they can function as an integrated unit. PMID:23530814
Marshall, Andrew J; Evanovich, Emma K; David, Sarah Jo; Mumma, Gregory H
2018-01-17
High comorbidity rates among emotional disorders have led researchers to examine transdiagnostic factors that may contribute to shared psychopathology. Bifactor models provide a unique method for examining transdiagnostic variables by modelling the common and unique factors within measures. Previous findings suggest that the bifactor model of the Depression Anxiety and Stress Scale (DASS) may provide a method for examining transdiagnostic factors within emotional disorders. This study aimed to replicate the bifactor model of the DASS, a multidimensional measure of psychological distress, within a US adult sample and provide initial estimates of the reliability of the general and domain-specific factors. Furthermore, this study hypothesized that Worry, a theorized transdiagnostic variable, would show stronger relations to general emotional distress than domain-specific subscales. Confirmatory factor analysis was used to evaluate the bifactor model structure of the DASS in 456 US adult participants (279 females and 177 males, mean age 35.9 years) recruited online. The DASS bifactor model fitted well (CFI = 0.98; RMSEA = 0.05). The General Emotional Distress factor accounted for most of the reliable variance in item scores. Domain-specific subscales accounted for modest portions of reliable variance in items after accounting for the general scale. Finally, structural equation modelling indicated that Worry was strongly predicted by the General Emotional Distress factor. The DASS bifactor model is generalizable to a US community sample and General Emotional Distress, but not domain-specific factors, strongly predict the transdiagnostic variable Worry.
Overlap between treatment and control distributions as an effect size measure in experiments.
Hedges, Larry V; Olkin, Ingram
2016-03-01
The proportion π of treatment group observations that exceed the control group mean has been proposed as an effect size measure for experiments that randomly assign independent units into 2 groups. We give the exact distribution of a simple estimator of π based on the standardized mean difference and use it to study the small sample bias of this estimator. We also give the minimum variance unbiased estimator of π under 2 models, one in which the variance of the mean difference is known and one in which the variance is unknown. We show how to use the relation between the standardized mean difference and the overlap measure to compute confidence intervals for π and show that these results can be used to obtain unbiased estimators, large sample variances, and confidence intervals for 3 related effect size measures based on the overlap. Finally, we show how the effect size π can be used in a meta-analysis. (c) 2016 APA, all rights reserved).
Knopman, Debra S.; Voss, Clifford I.
1989-01-01
Sampling design for site characterization studies of solute transport in porous media is formulated as a multiobjective problem. Optimal design of a sampling network is a sequential process in which the next phase of sampling is designed on the basis of all available physical knowledge of the system. Three objectives are considered: model discrimination, parameter estimation, and cost minimization. For the first two objectives, physically based measures of the value of information obtained from a set of observations are specified. In model discrimination, value of information of an observation point is measured in terms of the difference in solute concentration predicted by hypothesized models of transport. Points of greatest difference in predictions can contribute the most information to the discriminatory power of a sampling design. Sensitivity of solute concentration to a change in a parameter contributes information on the relative variance of a parameter estimate. Inclusion of points in a sampling design with high sensitivities to parameters tends to reduce variance in parameter estimates. Cost minimization accounts for both the capital cost of well installation and the operating costs of collection and analysis of field samples. Sensitivities, discrimination information, and well installation and sampling costs are used to form coefficients in the multiobjective problem in which the decision variables are binary (zero/one), each corresponding to the selection of an observation point in time and space. The solution to the multiobjective problem is a noninferior set of designs. To gain insight into effective design strategies, a one-dimensional solute transport problem is hypothesized. Then, an approximation of the noninferior set is found by enumerating 120 designs and evaluating objective functions for each of the designs. Trade-offs between pairs of objectives are demonstrated among the models. The value of an objective function for a given design is shown to correspond to the ability of a design to actually meet an objective.
Infiltration and runoff generation processes in fire-affected soils
Moody, John A.; Ebel, Brian A.
2014-01-01
Post-wildfire runoff was investigated by combining field measurements and modelling of infiltration into fire-affected soils to predict time-to-start of runoff and peak runoff rate at the plot scale (1 m2). Time series of soil-water content, rainfall and runoff were measured on a hillslope burned by the 2010 Fourmile Canyon Fire west of Boulder, Colorado during cyclonic and convective rainstorms in the spring and summer of 2011. Some of the field measurements and measured soil physical properties were used to calibrate a one-dimensional post-wildfire numerical model, which was then used as a ‘virtual instrument’ to provide estimates of the saturated hydraulic conductivity and high-resolution (1 mm) estimates of the soil-water profile and water fluxes within the unsaturated zone.Field and model estimates of the wetting-front depth indicated that post-wildfire infiltration was on average confined to shallow depths less than 30 mm. Model estimates of the effective saturated hydraulic conductivity, Ks, near the soil surface ranged from 0.1 to 5.2 mm h−1. Because of the relatively small values of Ks, the time-to-start of runoff (measured from the start of rainfall), tp, was found to depend only on the initial soil-water saturation deficit (predicted by the model) and a measured characteristic of the rainfall profile (referred to as the average rainfall acceleration, equal to the initial rate of change in rainfall intensity). An analytical model was developed from the combined results and explained 92–97% of the variance of tp, and the numerical infiltration model explained 74–91% of the variance of the peak runoff rates. These results are from one burned site, but they strongly suggest that tp in fire-affected soils (which often have low values of Ks) is probably controlled more by the storm profile and the initial soil-water saturation deficit than by soil hydraulic properties.
Can, Dilara Deniz; Ginsburg-Block, Marika; Golinkoff, Roberta Michnick; Hirsh-Pasek, Kathryn
2013-09-01
This longitudinal study examined the predictive validity of the MacArthur Communicative Developmental Inventories-Short Form (CDI-SF), a parent report questionnaire about children's language development (Fenson, Pethick, Renda, Cox, Dale & Reznick, 2000). Data were first gathered from parents on the CDI-SF vocabulary scores for seventy-six children (mean age=1 ; 10). Four years later (mean age=6 ; 1), children were assessed on language outcomes (expressive vocabulary, syntax, semantics and pragmatics) and code-related skills, including phonemic awareness, word recognition and decoding skills. Hierarchical regression analyses revealed that early expressive vocabulary accounted for 17% of the variance in picture vocabulary, 11% of the variance in syntax, and 7% of the variance in semantics, while not accounting for any variance in pragmatics in kindergarten. CDI-SF scores did not predict code-related skills in kindergarten. The importance of early vocabulary skills for later language development and CDI-SF as a valuable research tool are discussed.
Technical Note: Introduction of variance component analysis to setup error analysis in radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matsuo, Yukinori, E-mail: ymatsuo@kuhp.kyoto-u.ac.
Purpose: The purpose of this technical note is to introduce variance component analysis to the estimation of systematic and random components in setup error of radiotherapy. Methods: Balanced data according to the one-factor random effect model were assumed. Results: Analysis-of-variance (ANOVA)-based computation was applied to estimate the values and their confidence intervals (CIs) for systematic and random errors and the population mean of setup errors. The conventional method overestimates systematic error, especially in hypofractionated settings. The CI for systematic error becomes much wider than that for random error. The ANOVA-based estimation can be extended to a multifactor model considering multiplemore » causes of setup errors (e.g., interpatient, interfraction, and intrafraction). Conclusions: Variance component analysis may lead to novel applications to setup error analysis in radiotherapy.« less
High-Throughput Models for Exposure-Based Chemical ...
The United States Environmental Protection Agency (U.S. EPA) must characterize potential risks to human health and the environment associated with manufacture and use of thousands of chemicals. High-throughput screening (HTS) for biological activity allows the ToxCast research program to prioritize chemical inventories for potential hazard. Similar capabilities for estimating exposure potential would support rapid risk-based prioritization for chemicals with limited information; here, we propose a framework for high-throughput exposure assessment. To demonstrate application, an analysis was conducted that predicts human exposure potential for chemicals and estimates uncertainty in these predictions by comparison to biomonitoring data. We evaluated 1936 chemicals using far-field mass balance human exposure models (USEtox and RAIDAR) and an indicator for indoor and/or consumer use. These predictions were compared to exposures inferred by Bayesian analysis from urine concentrations for 82 chemicals reported in the National Health and Nutrition Examination Survey (NHANES). Joint regression on all factors provided a calibrated consensus prediction, the variance of which serves as an empirical determination of uncertainty for prioritization on absolute exposure potential. Information on use was found to be most predictive; generally, chemicals above the limit of detection in NHANES had consumer/indoor use. Coupled with hazard HTS, exposure HTS can place risk earlie
Ortega Hinojosa, Alberto M; Davies, Molly M; Jarjour, Sarah; Burnett, Richard T; Mann, Jennifer K; Hughes, Edward; Balmes, John R; Turner, Michelle C; Jerrett, Michael
2014-10-01
Globally and in the United States, smoking and obesity are leading causes of death and disability. Reliable estimates of prevalence for these risk factors are often missing variables in public health surveillance programs. This may limit the capacity of public health surveillance to target interventions or to assess associations between other environmental risk factors (e.g., air pollution) and health because smoking and obesity are often important confounders. To generate prevalence estimates of smoking and obesity rates over small areas for the United States (i.e., at the ZIP code and census tract levels). We predicted smoking and obesity prevalence using a combined approach first using a lasso-based variable selection procedure followed by a two-level random effects regression with a Poisson link clustered on state and county. We used data from the Behavioral Risk Factor Surveillance System (BRFSS) from 1991 to 2010 to estimate the model. We used 10-fold cross-validated mean squared errors and the variance of the residuals to test our model. To downscale the estimates we combined the prediction equations with 1990 and 2000 U.S. Census data for each of the four five-year time periods in this time range at the ZIP code and census tract levels. Several sensitivity analyses were conducted using models that included only basic terms, that accounted for spatial autocorrelation, and used Generalized Linear Models that did not include random effects. The two-level random effects model produced improved estimates compared to the fixed effects-only models. Estimates were particularly improved for the two-thirds of the conterminous U.S. where BRFSS data were available to estimate the county level random effects. We downscaled the smoking and obesity rate predictions to derive ZIP code and census tract estimates. To our knowledge these smoking and obesity predictions are the first to be developed for the entire conterminous U.S. for census tracts and ZIP codes. Our estimates could have significant utility for public health surveillance. Copyright © 2014. Published by Elsevier Inc.
van Aert, Robbie C M; Jackson, Dan
2018-04-26
A wide variety of estimators of the between-study variance are available in random-effects meta-analysis. Many, but not all, of these estimators are based on the method of moments. The DerSimonian-Laird estimator is widely used in applications, but the Paule-Mandel estimator is an alternative that is now recommended. Recently, DerSimonian and Kacker have developed two-step moment-based estimators of the between-study variance. We extend these two-step estimators so that multiple (more than two) steps are used. We establish the surprising result that the multistep estimator tends towards the Paule-Mandel estimator as the number of steps becomes large. Hence, the iterative scheme underlying our new multistep estimator provides a hitherto unknown relationship between two-step estimators and Paule-Mandel estimator. Our analysis suggests that two-step estimators are not necessarily distinct estimators in their own right; instead, they are quantities that are closely related to the usual iterative scheme that is used to calculate the Paule-Mandel estimate. The relationship that we establish between the multistep and Paule-Mandel estimator is another justification for the use of the latter estimator. Two-step and multistep estimators are perhaps best conceptualized as approximate Paule-Mandel estimators. © 2018 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Prediction of Imagined Single-Joint Movements in a Person with High Level Tetraplegia
Simeral, John D.; Donoghue, John P.; Hochberg, Leigh R.; Kirsch, Robert F.
2013-01-01
Cortical neuroprostheses for movement restoration require developing models for relating neural activity to desired movement. Previous studies have focused on correlating single-unit activities (SUA) in primary motor cortex to volitional arm movements in able-bodied primates. The extent of the cortical information relevant to arm movements remaining in severely paralyzed individuals is largely unknown. We record intracortical signals using a microelectrode array chronically implanted in the precentral gyrus of a person with tetraplegia, and estimate positions of imagined single-joint arm movements. Using visually guided motor imagery, the participant imagined performing eight distinct single-joint arm movements while SUA, multi-spike trains (MSP), multi-unit activity (MUA), and local field potential time (LFPrms) and frequency signals (LFPstft) were recorded. Using linear system identification, imagined joint trajectories were estimated with 20 – 60% variance explained, with wrist flexion/extension predicted the best and pronation/supination the poorest. Statistically, decoding of MSP and LFPstft yielded estimates that equaled those of SUA. Including multiple signal types in a decoder increased prediction accuracy in all cases. We conclude that signals recorded from a single restricted region of the precentral gyrus in this person with tetraplegia contained useful information regarding the intended movements of upper extremity joints. PMID:22851229
Boomer, Kathleen B; Weller, Donald E; Jordan, Thomas E
2008-01-01
The Universal Soil Loss Equation (USLE) and its derivatives are widely used for identifying watersheds with a high potential for degrading stream water quality. We compared sediment yields estimated from regional application of the USLE, the automated revised RUSLE2, and five sediment delivery ratio algorithms to measured annual average sediment delivery in 78 catchments of the Chesapeake Bay watershed. We did the same comparisons for another 23 catchments monitored by the USGS. Predictions exceeded observed sediment yields by more than 100% and were highly correlated with USLE erosion predictions (Pearson r range, 0.73-0.92; p < 0.001). RUSLE2-erosion estimates were highly correlated with USLE estimates (r = 0.87; p < 001), so the method of implementing the USLE model did not change the results. In ranked comparisons between observed and predicted sediment yields, the models failed to identify catchments with higher yields (r range, -0.28-0.00; p > 0.14). In a multiple regression analysis, soil erodibility, log (stream flow), basin shape (topographic relief ratio), the square-root transformed proportion of forest, and occurrence in the Appalachian Plateau province explained 55% of the observed variance in measured suspended sediment loads, but the model performed poorly (r(2) = 0.06) at predicting loads in the 23 USGS watersheds not used in fitting the model. The use of USLE or multiple regression models to predict sediment yields is not advisable despite their present widespread application. Integrated watershed models based on the USLE may also be unsuitable for making management decisions.
Mohammadi, Mohammad Hossein; Vanclooster, Marnik
2012-05-01
Solute transport in partially saturated soils is largely affected by fluid velocity distribution and pore size distribution within the solute transport domain. Hence, it is possible to describe the solute transport process in terms of the pore size distribution of the soil, and indirectly in terms of the soil hydraulic properties. In this paper, we present a conceptual approach that allows predicting the parameters of the Convective Lognormal Transfer model from knowledge of soil moisture and the Soil Moisture Characteristic (SMC), parameterized by means of the closed-form model of Kosugi (1996). It is assumed that in partially saturated conditions, the air filled pore volume act as an inert solid phase, allowing the use of the Arya et al. (1999) pragmatic approach to estimate solute travel time statistics from the saturation degree and SMC parameters. The approach is evaluated using a set of partially saturated transport experiments as presented by Mohammadi and Vanclooster (2011). Experimental results showed that the mean solute travel time, μ(t), increases proportionally with the depth (travel distance) and decreases with flow rate. The variance of solute travel time σ²(t) first decreases with flow rate up to 0.4-0.6 Ks and subsequently increases. For all tested BTCs predicted solute transport with μ(t) estimated from the conceptual model performed much better as compared to predictions with μ(t) and σ²(t) estimated from calibration of solute transport at shallow soil depths. The use of μ(t) estimated from the conceptual model therefore increases the robustness of the CLT model in predicting solute transport in heterogeneous soils at larger depths. In view of the fact that reasonable indirect estimates of the SMC can be made from basic soil properties using pedotransfer functions, the presented approach may be useful for predicting solute transport at field or watershed scales. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Mohammadi, Mohammad Hossein; Vanclooster, Marnik
2012-05-01
Solute transport in partially saturated soils is largely affected by fluid velocity distribution and pore size distribution within the solute transport domain. Hence, it is possible to describe the solute transport process in terms of the pore size distribution of the soil, and indirectly in terms of the soil hydraulic properties. In this paper, we present a conceptual approach that allows predicting the parameters of the Convective Lognormal Transfer model from knowledge of soil moisture and the Soil Moisture Characteristic (SMC), parameterized by means of the closed-form model of Kosugi (1996). It is assumed that in partially saturated conditions, the air filled pore volume act as an inert solid phase, allowing the use of the Arya et al. (1999) pragmatic approach to estimate solute travel time statistics from the saturation degree and SMC parameters. The approach is evaluated using a set of partially saturated transport experiments as presented by Mohammadi and Vanclooster (2011). Experimental results showed that the mean solute travel time, μt, increases proportionally with the depth (travel distance) and decreases with flow rate. The variance of solute travel time σ2t first decreases with flow rate up to 0.4-0.6 Ks and subsequently increases. For all tested BTCs predicted solute transport with μt estimated from the conceptual model performed much better as compared to predictions with μt and σ2t estimated from calibration of solute transport at shallow soil depths. The use of μt estimated from the conceptual model therefore increases the robustness of the CLT model in predicting solute transport in heterogeneous soils at larger depths. In view of the fact that reasonable indirect estimates of the SMC can be made from basic soil properties using pedotransfer functions, the presented approach may be useful for predicting solute transport at field or watershed scales.
Deflation as a method of variance reduction for estimating the trace of a matrix inverse
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gambhir, Arjun Singh; Stathopoulos, Andreas; Orginos, Kostas
Many fields require computing the trace of the inverse of a large, sparse matrix. The typical method used for such computations is the Hutchinson method which is a Monte Carlo (MC) averaging over matrix quadratures. To improve its convergence, several variance reductions techniques have been proposed. In this paper, we study the effects of deflating the near null singular value space. We make two main contributions. First, we analyze the variance of the Hutchinson method as a function of the deflated singular values and vectors. Although this provides good intuition in general, by assuming additionally that the singular vectors aremore » random unitary matrices, we arrive at concise formulas for the deflated variance that include only the variance and mean of the singular values. We make the remarkable observation that deflation may increase variance for Hermitian matrices but not for non-Hermitian ones. This is a rare, if not unique, property where non-Hermitian matrices outperform Hermitian ones. The theory can be used as a model for predicting the benefits of deflation. Second, we use deflation in the context of a large scale application of "disconnected diagrams" in Lattice QCD. On lattices, Hierarchical Probing (HP) has previously provided an order of magnitude of variance reduction over MC by removing "error" from neighboring nodes of increasing distance in the lattice. Although deflation used directly on MC yields a limited improvement of 30% in our problem, when combined with HP they reduce variance by a factor of over 150 compared to MC. For this, we pre-computated 1000 smallest singular values of an ill-conditioned matrix of size 25 million. Furthermore, using PRIMME and a domain-specific Algebraic Multigrid preconditioner, we perform one of the largest eigenvalue computations in Lattice QCD at a fraction of the cost of our trace computation.« less
Deflation as a method of variance reduction for estimating the trace of a matrix inverse
Gambhir, Arjun Singh; Stathopoulos, Andreas; Orginos, Kostas
2017-04-06
Many fields require computing the trace of the inverse of a large, sparse matrix. The typical method used for such computations is the Hutchinson method which is a Monte Carlo (MC) averaging over matrix quadratures. To improve its convergence, several variance reductions techniques have been proposed. In this paper, we study the effects of deflating the near null singular value space. We make two main contributions. First, we analyze the variance of the Hutchinson method as a function of the deflated singular values and vectors. Although this provides good intuition in general, by assuming additionally that the singular vectors aremore » random unitary matrices, we arrive at concise formulas for the deflated variance that include only the variance and mean of the singular values. We make the remarkable observation that deflation may increase variance for Hermitian matrices but not for non-Hermitian ones. This is a rare, if not unique, property where non-Hermitian matrices outperform Hermitian ones. The theory can be used as a model for predicting the benefits of deflation. Second, we use deflation in the context of a large scale application of "disconnected diagrams" in Lattice QCD. On lattices, Hierarchical Probing (HP) has previously provided an order of magnitude of variance reduction over MC by removing "error" from neighboring nodes of increasing distance in the lattice. Although deflation used directly on MC yields a limited improvement of 30% in our problem, when combined with HP they reduce variance by a factor of over 150 compared to MC. For this, we pre-computated 1000 smallest singular values of an ill-conditioned matrix of size 25 million. Furthermore, using PRIMME and a domain-specific Algebraic Multigrid preconditioner, we perform one of the largest eigenvalue computations in Lattice QCD at a fraction of the cost of our trace computation.« less
Estimators for Two Measures of Association for Set Correlation.
ERIC Educational Resources Information Center
Cohen, Jacob; Nee, John C. M.
1984-01-01
Two measures of association between sets of variables have been proposed for set correlation: the proportion of generalized variance, and the proportion of additionive variance. Because these measures are strongly positively biased, approximate expected values and estimators of these measures are derived and checked. (Author/BW)
Bernstein, Joshua G.W.; Mehraei, Golbarg; Shamma, Shihab; Gallun, Frederick J.; Theodoroff, Sarah M.; Leek, Marjorie R.
2014-01-01
Background A model that can accurately predict speech intelligibility for a given hearing-impaired (HI) listener would be an important tool for hearing-aid fitting or hearing-aid algorithm development. Existing speech-intelligibility models do not incorporate variability in suprathreshold deficits that are not well predicted by classical audiometric measures. One possible approach to the incorporation of such deficits is to base intelligibility predictions on sensitivity to simultaneously spectrally and temporally modulated signals. Purpose The likelihood of success of this approach was evaluated by comparing estimates of spectrotemporal modulation (STM) sensitivity to speech intelligibility and to psychoacoustic estimates of frequency selectivity and temporal fine-structure (TFS) sensitivity across a group of HI listeners. Research Design The minimum modulation depth required to detect STM applied to an 86 dB SPL four-octave noise carrier was measured for combinations of temporal modulation rate (4, 12, or 32 Hz) and spectral modulation density (0.5, 1, 2, or 4 cycles/octave). STM sensitivity estimates for individual HI listeners were compared to estimates of frequency selectivity (measured using the notched-noise method at 500, 1000measured using the notched-noise method at 500, 2000, and 4000 Hz), TFS processing ability (2 Hz frequency-modulation detection thresholds for 500, 10002 Hz frequency-modulation detection thresholds for 500, 2000, and 4000 Hz carriers) and sentence intelligibility in noise (at a 0 dB signal-to-noise ratio) that were measured for the same listeners in a separate study. Study Sample Eight normal-hearing (NH) listeners and 12 listeners with a diagnosis of bilateral sensorineural hearing loss participated. Data Collection and Analysis STM sensitivity was compared between NH and HI listener groups using a repeated-measures analysis of variance. A stepwise regression analysis compared STM sensitivity for individual HI listeners to audiometric thresholds, age, and measures of frequency selectivity and TFS processing ability. A second stepwise regression analysis compared speech intelligibility to STM sensitivity and the audiogram-based Speech Intelligibility Index. Results STM detection thresholds were elevated for the HI listeners, but only for low rates and high densities. STM sensitivity for individual HI listeners was well predicted by a combination of estimates of frequency selectivity at 4000 Hz and TFS sensitivity at 500 Hz but was unrelated to audiometric thresholds. STM sensitivity accounted for an additional 40% of the variance in speech intelligibility beyond the 40% accounted for by the audibility-based Speech Intelligibility Index. Conclusions Impaired STM sensitivity likely results from a combination of a reduced ability to resolve spectral peaks and a reduced ability to use TFS information to follow spectral-peak movements. Combining STM sensitivity estimates with audiometric threshold measures for individual HI listeners provided a more accurate prediction of speech intelligibility than audiometric measures alone. These results suggest a significant likelihood of success for an STM-based model of speech intelligibility for HI listeners. PMID:23636210
NASA Astrophysics Data System (ADS)
Fattoruso, Grazia; Longobardi, Antonia; Pizzuti, Alfredo; Molinara, Mario; Marocco, Claudio; De Vito, Saverio; Tortorella, Francesco; Di Francia, Girolamo
2017-06-01
Rainfall data collection gathered in continuous by a distributed rain gauge network is instrumental to more effective hydro-geological risk forecasting and management services though the input estimated rainfall fields suffer from prediction uncertainty. Optimal rain gauge networks can generate accurate estimated rainfall fields. In this research work, a methodology has been investigated for evaluating an optimal rain gauges network aimed at robust hydrogeological hazard investigations. The rain gauges of the Sarno River basin (Southern Italy) has been evaluated by optimizing a two-objective function that maximizes the estimated accuracy and minimizes the total metering cost through the variance reduction algorithm along with the climatological variogram (time-invariant). This problem has been solved by using an enumerative search algorithm, evaluating the exact Pareto-front by an efficient computational time.
Paul, David R; McGrath, Ryan; Vella, Chantal A; Kramer, Matthew; Baer, David J; Moshfegh, Alanna J
2018-03-26
The National Health and Nutrition Examination Survey physical activity questionnaire (PAQ) is used to estimate activity energy expenditure (AEE) and moderate to vigorous physical activity (MVPA). Bias and variance in estimates of AEE and MVPA from the PAQ have not been described, nor the impact of measurement error when utilizing the PAQ to predict biomarkers and categorize individuals. The PAQ was administered to 385 adults to estimate AEE (AEE:PAQ) and MVPA (MVPA:PAQ), while simultaneously measuring AEE with doubly labeled water (DLW; AEE:DLW) and MVPA with an accelerometer (MVPA:A). Although AEE:PAQ [3.4 (2.2) MJ·d -1 ] was not significantly different from AEE:DLW [3.6 (1.6) MJ·d -1 ; P > .14], MVPA:PAQ [36.2 (24.4) min·d -1 ] was significantly higher than MVPA:A [8.0 (10.4) min·d -1 ; P < .0001]. AEE:PAQ regressed on AEE:DLW and MVPA:PAQ regressed on MVPA:A yielded not only significant positive relationships but also large residual variances. The relationships between AEE and MVPA, and 10 of the 12 biomarkers were underestimated by the PAQ. When compared with accelerometers, the PAQ overestimated the number of participants who met the Physical Activity Guidelines for Americans. Group-level bias in AEE:PAQ was small, but large for MVPA:PAQ. Poor within-participant estimates of AEE:PAQ and MVPA:PAQ lead to attenuated relationships with biomarkers and misclassifications of participants who met or who did not meet the Physical Activity Guidelines for Americans.
NASA Technical Reports Server (NTRS)
Chhikara, R. S.; Perry, C. R., Jr. (Principal Investigator)
1980-01-01
The problem of determining the stratum variances required for an optimum sample allocation for remotely sensed crop surveys is investigated with emphasis on an approach based on the concept of stratum variance as a function of the sampling unit size. A methodology using the existing and easily available information of historical statistics is developed for obtaining initial estimates of stratum variances. The procedure is applied to variance for wheat in the U.S. Great Plains and is evaluated based on the numerical results obtained. It is shown that the proposed technique is viable and performs satisfactorily with the use of a conservative value (smaller than the expected value) for the field size and with the use of crop statistics from the small political division level.
Optimal Tuner Selection for Kalman Filter-Based Aircraft Engine Performance Estimation
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Garg, Sanjay
2010-01-01
A linear point design methodology for minimizing the error in on-line Kalman filter-based aircraft engine performance estimation applications is presented. This technique specifically addresses the underdetermined estimation problem, where there are more unknown parameters than available sensor measurements. A systematic approach is applied to produce a model tuning parameter vector of appropriate dimension to enable estimation by a Kalman filter, while minimizing the estimation error in the parameters of interest. Tuning parameter selection is performed using a multi-variable iterative search routine which seeks to minimize the theoretical mean-squared estimation error. This paper derives theoretical Kalman filter estimation error bias and variance values at steady-state operating conditions, and presents the tuner selection routine applied to minimize these values. Results from the application of the technique to an aircraft engine simulation are presented and compared to the conventional approach of tuner selection. Experimental simulation results are found to be in agreement with theoretical predictions. The new methodology is shown to yield a significant improvement in on-line engine performance estimation accuracy
Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan
2016-12-01
The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.
Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan
2016-01-01
The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192
Advanced Communication Processing Techniques Held in Ruidoso, New Mexico on 14-17 May 1989
1990-01-01
Criteria: * Prob. of Detection and False Alarm * Variances of Parameter Estimators * Prob. of Correct Classiflcsation and Rejection 0 2 In the exposure...couple of criteria. The tell? [LAUGHTER] If it was anybody else, I standard Neyman-Pearson approach for de- wouldn’t say .... tection, variances for... VARIANCE AISJ11T UPPER AND0 LOWER PMIOUIESOES FEATUE---OELET!U FETUA1E----WW-4A140 TIME SEOLIENTIAL CORRELATION FEATUE -$-ESTIMATED INA FEATURE-ID--LOW
Hudson, Nathan W.; Lucas, Richard E.; Donnellan, M. Brent; Kushlev, Kostadin
2017-01-01
Kushlev, Dunn, and Lucas (2015) found that income predicts less daily sadness—but not greater happiness—among Americans. The present study used longitudinal data from an approximately representative German sample to replicate and extend these findings. Our results largely replicated Kushlev and colleagues’: income predicted less daily sadness (albeit with a smaller effect size), but was unrelated to happiness. Moreover, the association between income and sadness could not be explained by demographics, stress, or daily time-use. Extending Kushlev and colleagues’ findings, new analyses indicated that only between-persons variance in income (but not within-persons variance) predicted daily sadness—perhaps because there was relatively little within-persons variance in income. Finally, income predicted less daily sadness and worry, but not less anger or frustration—potentially suggesting that income predicts less “internalizing” but not less “externalizing” negative emotions. Together, our study and Kushlev and colleagues’ provide evidence that income robustly predicts select daily negative emotions—but not positive ones. PMID:29250303
Lin, Chen-Yen; Halabi, Susan
2017-01-01
We propose a minimand perturbation method to derive the confidence regions for the regularized estimators for the Cox’s proportional hazards model. Although the regularized estimation procedure produces a more stable point estimate, it remains challenging to provide an interval estimator or an analytic variance estimator for the associated point estimate. Based on the sandwich formula, the current variance estimator provides a simple approximation, but its finite sample performance is not entirely satisfactory. Besides, the sandwich formula can only provide variance estimates for the non-zero coefficients. In this article, we present a generic description for the perturbation method and then introduce a computation algorithm using the adaptive least absolute shrinkage and selection operator (LASSO) penalty. Through simulation studies, we demonstrate that our method can better approximate the limiting distribution of the adaptive LASSO estimator and produces more accurate inference compared with the sandwich formula. The simulation results also indicate the possibility of extending the applications to the adaptive elastic-net penalty. We further demonstrate our method using data from a phase III clinical trial in prostate cancer. PMID:29326496
Lin, Chen-Yen; Halabi, Susan
2017-01-01
We propose a minimand perturbation method to derive the confidence regions for the regularized estimators for the Cox's proportional hazards model. Although the regularized estimation procedure produces a more stable point estimate, it remains challenging to provide an interval estimator or an analytic variance estimator for the associated point estimate. Based on the sandwich formula, the current variance estimator provides a simple approximation, but its finite sample performance is not entirely satisfactory. Besides, the sandwich formula can only provide variance estimates for the non-zero coefficients. In this article, we present a generic description for the perturbation method and then introduce a computation algorithm using the adaptive least absolute shrinkage and selection operator (LASSO) penalty. Through simulation studies, we demonstrate that our method can better approximate the limiting distribution of the adaptive LASSO estimator and produces more accurate inference compared with the sandwich formula. The simulation results also indicate the possibility of extending the applications to the adaptive elastic-net penalty. We further demonstrate our method using data from a phase III clinical trial in prostate cancer.
Rawlins, B G; Scheib, C; Tyler, A N; Beamish, D
2012-12-01
Regulatory authorities need ways to estimate natural terrestrial gamma radiation dose rates (nGy h⁻¹) across the landscape accurately, to assess its potential deleterious health effects. The primary method for estimating outdoor dose rate is to use an in situ detector supported 1 m above the ground, but such measurements are costly and cannot capture the landscape-scale variation in dose rates which are associated with changes in soil and parent material mineralogy. We investigate the potential for improving estimates of terrestrial gamma dose rates across Northern Ireland (13,542 km²) using measurements from 168 sites and two sources of ancillary data: (i) a map based on a simplified classification of soil parent material, and (ii) dose estimates from a national-scale, airborne radiometric survey. We used the linear mixed modelling framework in which the two ancillary variables were included in separate models as fixed effects, plus a correlation structure which captures the spatially correlated variance component. We used a cross-validation procedure to determine the magnitude of the prediction errors for the different models. We removed a random subset of 10 terrestrial measurements and formed the model from the remainder (n = 158), and then used the model to predict values at the other 10 sites. We repeated this procedure 50 times. The measurements of terrestrial dose vary between 1 and 103 (nGy h⁻¹). The median absolute model prediction errors (nGy h⁻¹) for the three models declined in the following order: no ancillary data (10.8) > simple geological classification (8.3) > airborne radiometric dose (5.4) as a single fixed effect. Estimates of airborne radiometric gamma dose rate can significantly improve the spatial prediction of terrestrial dose rate.
ERIC Educational Resources Information Center
Stapleton, Laura M.
2008-01-01
This article discusses replication sampling variance estimation techniques that are often applied in analyses using data from complex sampling designs: jackknife repeated replication, balanced repeated replication, and bootstrapping. These techniques are used with traditional analyses such as regression, but are currently not used with structural…
Nguyen, Trang Quynh; Webb-Vargas, Yenny; Koning, Ina M; Stuart, Elizabeth A
We investigate a method to estimate the combined effect of multiple continuous/ordinal mediators on a binary outcome: 1) fit a structural equation model with probit link for the outcome and identity/probit link for continuous/ordinal mediators, 2) predict potential outcome probabilities, and 3) compute natural direct and indirect effects. Step 2 involves rescaling the latent continuous variable underlying the outcome to address residual mediator variance/covariance. We evaluate the estimation of risk-difference- and risk-ratio-based effects (RDs, RRs) using the ML, WLSMV and Bayes estimators in Mplus. Across most variations in path-coefficient and mediator-residual-correlation signs and strengths, and confounding situations investigated, the method performs well with all estimators, but favors ML/WLSMV for RDs with continuous mediators, and Bayes for RRs with ordinal mediators. Bayes outperforms WLSMV/ML regardless of mediator type when estimating RRs with small potential outcome probabilities and in two other special cases. An adolescent alcohol prevention study is used for illustration.
Jensen's Inequality Predicts Effects of Environmental Variation
Jonathan J. Ruel; Matthew P. Ayres
1999-01-01
Many biologists now recognize that environmental variance can exert important effects on patterns and processes in nature that are independent of average conditions. Jenson's inequality is a mathematical proof that is seldom mentioned in the ecological literature but which provides a powerful tool for predicting some direct effects of environmental variance in...
McGinitie, Teague M; Ebrahimi-Najafabadi, Heshmatollah; Harynuk, James J
2014-02-21
A new method for calibrating thermodynamic data to be used in the prediction of analyte retention times is presented. The method allows thermodynamic data collected on one column to be used in making predictions across columns of the same stationary phase but with varying geometries. This calibration is essential as slight variances in the column inner diameter and stationary phase film thickness between columns or as a column ages will adversely affect the accuracy of predictions. The calibration technique uses a Grob standard mixture along with a Nelder-Mead simplex algorithm and a previously developed model of GC retention times based on a three-parameter thermodynamic model to estimate both inner diameter and stationary phase film thickness. The calibration method is highly successful with the predicted retention times for a set of alkanes, ketones and alcohols having an average error of 1.6s across three columns. Copyright © 2014 Elsevier B.V. All rights reserved.
Jay, Sylvain; Guillaume, Mireille; Chami, Malik; Minghelli, Audrey; Deville, Yannick; Lafrance, Bruno; Serfaty, Véronique
2018-01-22
We present an analytical approach based on Cramer-Rao Bounds (CRBs) to investigate the uncertainties in estimated ocean color parameters resulting from the propagation of uncertainties in the bio-optical reflectance modeling through the inversion process. Based on given bio-optical and noise probabilistic models, CRBs can be computed efficiently for any set of ocean color parameters and any sensor configuration, directly providing the minimum estimation variance that can be possibly attained by any unbiased estimator of any targeted parameter. Here, CRBs are explicitly developed using (1) two water reflectance models corresponding to deep and shallow waters, resp., and (2) four probabilistic models describing the environmental noises observed within four Sentinel-2 MSI, HICO, Sentinel-3 OLCI and MODIS images, resp. For both deep and shallow waters, CRBs are shown to be consistent with the experimental estimation variances obtained using two published remote-sensing methods, while not requiring one to perform any inversion. CRBs are also used to investigate to what extent perfect a priori knowledge on one or several geophysical parameters can improve the estimation of remaining unknown parameters. For example, using pre-existing knowledge of bathymetry (e.g., derived from LiDAR) within the inversion is shown to greatly improve the retrieval of bottom cover for shallow waters. Finally, CRBs are shown to provide valuable information on the best estimation performances that may be achieved with the MSI, HICO, OLCI and MODIS configurations for a variety of oceanic, coastal and inland waters. CRBs are thus demonstrated to be an informative and efficient tool to characterize minimum uncertainties in inverted ocean color geophysical parameters.
A Formula to Calculate Standard Liver Volume Using Thoracoabdominal Circumference.
Shaw, Brian I; Burdine, Lyle J; Braun, Hillary J; Ascher, Nancy L; Roberts, John P
2017-12-01
With the use of split liver grafts as well as living donor liver transplantation (LDLT) it is imperative to know the minimum graft volume to avoid complications. Most current formulas to predict standard liver volume (SLV) rely on weight-based measures that are likely inaccurate in the setting of cirrhosis. Therefore, we sought to create a formula for estimating SLV without weight-based covariates. LDLT donors underwent computed tomography scan volumetric evaluation of their livers. An optimal formula for calculating SLV using the anthropomorphic measure thoracoabdominal circumference (TAC) was determined using leave-one-out cross-validation. The ability of this formula to correctly predict liver volume was checked against other existing formulas by analysis of variance. The ability of the formula to predict small grafts in LDLT was evaluated by exact logistic regression. The optimal formula using TAC was determined to be SLV = (TAC × 3.5816) - (Age × 3.9844) - (Sex × 109.7386) - 934.5949. When compared to historic formulas, the current formula was the only one which was not significantly different than computed tomography determined liver volumes when compared by analysis of variance with Dunnett posttest. When evaluating the ability of the formula to predict small for size syndrome, many (10/16) of the formulas tested had significant results by exact logistic regression, with our formula predicting small for size syndrome with an odds ratio of 7.94 (95% confidence interval, 1.23-91.36; P = 0.025). We report a formula for calculating SLV that does not rely on weight-based variables that has good ability to predict SLV and identify patients with potentially small grafts.
Vitezica, Zulma G; Varona, Luis; Elsen, Jean-Michel; Misztal, Ignacy; Herring, William; Legarra, Andrès
2016-01-29
Most developments in quantitative genetics theory focus on the study of intra-breed/line concepts. With the availability of massive genomic information, it becomes necessary to revisit the theory for crossbred populations. We propose methods to construct genomic covariances with additive and non-additive (dominance) inheritance in the case of pure lines and crossbred populations. We describe substitution effects and dominant deviations across two pure parental populations and the crossbred population. Gene effects are assumed to be independent of the origin of alleles and allelic frequencies can differ between parental populations. Based on these assumptions, the theoretical variance components (additive and dominant) are obtained as a function of marker effects and allelic frequencies. The additive genetic variance in the crossbred population includes the biological additive and dominant effects of a gene and a covariance term. Dominance variance in the crossbred population is proportional to the product of the heterozygosity coefficients of both parental populations. A genomic BLUP (best linear unbiased prediction) equivalent model is presented. We illustrate this approach by using pig data (two pure lines and their cross, including 8265 phenotyped and genotyped sows). For the total number of piglets born, the dominance variance in the crossbred population represented about 13 % of the total genetic variance. Dominance variation is only marginally important for litter size in the crossbred population. We present a coherent marker-based model that includes purebred and crossbred data and additive and dominant actions. Using this model, it is possible to estimate breeding values, dominant deviations and variance components in a dataset that comprises data on purebred and crossbred individuals. These methods can be exploited to plan assortative mating in pig, maize or other species, in order to generate superior crossbred individuals in terms of performance.
Hu, Jianhua; Wright, Fred A
2007-03-01
The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.
NASA Astrophysics Data System (ADS)
Hernández, Mario R.; Francés, Félix
2015-04-01
One phase of the hydrological models implementation process, significantly contributing to the hydrological predictions uncertainty, is the calibration phase in which values of the unknown model parameters are tuned by optimizing an objective function. An unsuitable error model (e.g. Standard Least Squares or SLS) introduces noise into the estimation of the parameters. The main sources of this noise are the input errors and the hydrological model structural deficiencies. Thus, the biased calibrated parameters cause the divergence model phenomenon, where the errors variance of the (spatially and temporally) forecasted flows far exceeds the errors variance in the fitting period, and provoke the loss of part or all of the physical meaning of the modeled processes. In other words, yielding a calibrated hydrological model which works well, but not for the right reasons. Besides, an unsuitable error model yields a non-reliable predictive uncertainty assessment. Hence, with the aim of prevent all these undesirable effects, this research focuses on the Bayesian joint inference (BJI) of both the hydrological and error model parameters, considering a general additive (GA) error model that allows for correlation, non-stationarity (in variance and bias) and non-normality of model residuals. As hydrological model, it has been used a conceptual distributed model called TETIS, with a particular split structure of the effective model parameters. Bayesian inference has been performed with the aid of a Markov Chain Monte Carlo (MCMC) algorithm called Dream-ZS. MCMC algorithm quantifies the uncertainty of the hydrological and error model parameters by getting the joint posterior probability distribution, conditioned on the observed flows. The BJI methodology is a very powerful and reliable tool, but it must be used correctly this is, if non-stationarity in errors variance and bias is modeled, the Total Laws must be taken into account. The results of this research show that the application of BJI with a GA error model outperforms the hydrological parameters robustness (diminishing the divergence model phenomenon) and improves the reliability of the streamflow predictive distribution, in respect of the results of a bad error model as SLS. Finally, the most likely prediction in a validation period, for both BJI+GA and SLS error models shows a similar performance.
Maternal characteristics predicting young girls' disruptive behavior.
van der Molen, Elsa; Hipwell, Alison E; Vermeiren, Robert; Loeber, Rolf
2011-01-01
Little is known about the relative predictive utility of maternal characteristics and parenting skills on the development of girls' disruptive behavior. The current study used five waves of parent- and child-report data from the ongoing Pittsburgh Girls Study to examine these relationships in a sample of 1,942 girls from age 7 to 12 years. Multivariate generalized estimating equation analyses indicated that European American race, mother's prenatal nicotine use, maternal depression, maternal conduct problems prior to age 15, and low maternal warmth explained unique variance. Maladaptive parenting partly mediated the effects of maternal depression and maternal conduct problems. Both current and early maternal risk factors have an impact on young girls' disruptive behavior, providing support for the timing and focus of the prevention of girls' disruptive behavior.
Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal
2009-01-01
The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455
Field-Scale Evaluation of Infiltration Parameters From Soil Texture for Hydrologic Analysis
NASA Astrophysics Data System (ADS)
Springer, Everett P.; Cundy, Terrance W.
1987-02-01
Recent interest in predicting soil hydraulic properties from simple physical properties such as texture has major implications in the parameterization of physically based models of surface runoff. This study was undertaken to (1) compare, on a field scale, soil hydraulic parameters predicted from texture to those derived from field measurements and (2) compare simulated overland flow response using these two parameter sets. The parameters for the Green-Ampt infiltration equation were obtained from field measurements and using texture-based predictors for two agricultural fields, which were mapped as single soil units. Results of the analyses were that (1) the mean and variance of the field-based parameters were not preserved by the texture-based estimates, (2) spatial and cross correlations between parameters were induced by the texture-based estimation procedures, (3) the overland flow simulations using texture-based parameters were significantly different than those from field-based parameters, and (4) simulations using field-measured hydraulic conductivities and texture-based storage parameters were very close to simulations using only field-based parameters.
Cai, C; Rodet, T; Legoupil, S; Mohammad-Djafari, A
2013-11-01
Dual-energy computed tomography (DECT) makes it possible to get two fractions of basis materials without segmentation. One is the soft-tissue equivalent water fraction and the other is the hard-matter equivalent bone fraction. Practical DECT measurements are usually obtained with polychromatic x-ray beams. Existing reconstruction approaches based on linear forward models without counting the beam polychromaticity fail to estimate the correct decomposition fractions and result in beam-hardening artifacts (BHA). The existing BHA correction approaches either need to refer to calibration measurements or suffer from the noise amplification caused by the negative-log preprocessing and the ill-conditioned water and bone separation problem. To overcome these problems, statistical DECT reconstruction approaches based on nonlinear forward models counting the beam polychromaticity show great potential for giving accurate fraction images. This work proposes a full-spectral Bayesian reconstruction approach which allows the reconstruction of high quality fraction images from ordinary polychromatic measurements. This approach is based on a Gaussian noise model with unknown variance assigned directly to the projections without taking negative-log. Referring to Bayesian inferences, the decomposition fractions and observation variance are estimated by using the joint maximum a posteriori (MAP) estimation method. Subject to an adaptive prior model assigned to the variance, the joint estimation problem is then simplified into a single estimation problem. It transforms the joint MAP estimation problem into a minimization problem with a nonquadratic cost function. To solve it, the use of a monotone conjugate gradient algorithm with suboptimal descent steps is proposed. The performance of the proposed approach is analyzed with both simulated and experimental data. The results show that the proposed Bayesian approach is robust to noise and materials. It is also necessary to have the accurate spectrum information about the source-detector system. When dealing with experimental data, the spectrum can be predicted by a Monte Carlo simulator. For the materials between water and bone, less than 5% separation errors are observed on the estimated decomposition fractions. The proposed approach is a statistical reconstruction approach based on a nonlinear forward model counting the full beam polychromaticity and applied directly to the projections without taking negative-log. Compared to the approaches based on linear forward models and the BHA correction approaches, it has advantages in noise robustness and reconstruction accuracy.
Low-dimensional Representation of Error Covariance
NASA Technical Reports Server (NTRS)
Tippett, Michael K.; Cohn, Stephen E.; Todling, Ricardo; Marchesin, Dan
2000-01-01
Ensemble and reduced-rank approaches to prediction and assimilation rely on low-dimensional approximations of the estimation error covariances. Here stability properties of the forecast/analysis cycle for linear, time-independent systems are used to identify factors that cause the steady-state analysis error covariance to admit a low-dimensional representation. A useful measure of forecast/analysis cycle stability is the bound matrix, a function of the dynamics, observation operator and assimilation method. Upper and lower estimates for the steady-state analysis error covariance matrix eigenvalues are derived from the bound matrix. The estimates generalize to time-dependent systems. If much of the steady-state analysis error variance is due to a few dominant modes, the leading eigenvectors of the bound matrix approximate those of the steady-state analysis error covariance matrix. The analytical results are illustrated in two numerical examples where the Kalman filter is carried to steady state. The first example uses the dynamics of a generalized advection equation exhibiting nonmodal transient growth. Failure to observe growing modes leads to increased steady-state analysis error variances. Leading eigenvectors of the steady-state analysis error covariance matrix are well approximated by leading eigenvectors of the bound matrix. The second example uses the dynamics of a damped baroclinic wave model. The leading eigenvectors of a lowest-order approximation of the bound matrix are shown to approximate well the leading eigenvectors of the steady-state analysis error covariance matrix.
Evaluation of surface renewal and flux-variance methods above agricultural and forest surfaces
NASA Astrophysics Data System (ADS)
Fischer, M.; Katul, G. G.; Noormets, A.; Poznikova, G.; Domec, J. C.; Trnka, M.; King, J. S.
2016-12-01
Measurements of turbulent surface energy fluxes are of high interest in agriculture and forest research. During last decades, eddy covariance (EC), has been adopted as the most commonly used micrometeorological method for measuring fluxes of greenhouse gases, energy and other scalars at the surface-atmosphere interface. Despite its robustness and accuracy, the costs of EC hinder its deployment at some research experiments and in practice like e.g. for irrigation scheduling. Therefore, testing and development of other cost-effective methods is of high interest. In our study, we tested performance of surface renewal (SR) and flux variance method (FV) for estimates of sensible heat flux density. Surface renewal method is based on the concept of non-random transport of scalars via so-called coherent structures which if accurately identified can be used for the computing of associated flux. Flux variance method predicts the flux from the scalar variance following the surface-layer similarity theory. We tested SR and FV against EC in three types of ecosystem with very distinct aerodynamic properties. First site was represented by agricultural wheat field in the Czech Republic. The second site was a 20-m tall mixed deciduous wetland forest on the coast of North Carolina, USA. The third site was represented by pine-switchgrass intercropping agro-forestry system located in coastal plain of North Carolina, USA. Apart from solving the coherent structures in a SR framework from the structure functions (representing the most common approach), we applied ramp wavelet detection scheme to test the hypothesis that the duration and amplitudes of the coherent structures are normally distributed within the particular 30-minutes time intervals and so just the estimates of their averages is sufficient for the accurate flux determination. Further, we tested whether the orthonormal wavelet thresholding can be used for isolating of the coherent structure scales which are associated with flux transport. Finally, we tested whether low-pass filtering in the Fourier domain based on integral length scale can improve estimates of both SR and FV as it supposedly removes the low frequency portion of the signal not related with the investigated fluxes.
ERIC Educational Resources Information Center
Beauducel, Andre; Herzberg, Philipp Yorck
2006-01-01
This simulation study compared maximum likelihood (ML) estimation with weighted least squares means and variance adjusted (WLSMV) estimation. The study was based on confirmatory factor analyses with 1, 2, 4, and 8 factors, based on 250, 500, 750, and 1,000 cases, and on 5, 10, 20, and 40 variables with 2, 3, 4, 5, and 6 categories. There was no…
Estimation and Simulation of Slow Crack Growth Parameters from Constant Stress Rate Data
NASA Technical Reports Server (NTRS)
Salem, Jonathan A.; Weaver, Aaron S.
2003-01-01
Closed form, approximate functions for estimating the variances and degrees-of-freedom associated with the slow crack growth parameters n, D, B, and A(sup *) as measured using constant stress rate ('dynamic fatigue') testing were derived by using propagation of errors. Estimates made with the resulting functions and slow crack growth data for a sapphire window were compared to the results of Monte Carlo simulations. The functions for estimation of the variances of the parameters were derived both with and without logarithmic transformation of the initial slow crack growth equations. The transformation was performed to make the functions both more linear and more normal. Comparison of the Monte Carlo results and the closed form expressions derived with propagation of errors indicated that linearization is not required for good estimates of the variances of parameters n and D by the propagation of errors method. However, good estimates variances of the parameters B and A(sup *) could only be made when the starting slow crack growth equation was transformed and the coefficients of variation of the input parameters were not too large. This was partially a result of the skewered distributions of B and A(sup *). Parametric variation of the input parameters was used to determine an acceptable range for using closed form approximate equations derived from propagation of errors.
NASA Technical Reports Server (NTRS)
Wolf, Michael
2012-01-01
A document describes an algorithm created to estimate the mass placed on a sample verification sensor (SVS) designed for lunar or planetary robotic sample return missions. A novel SVS measures the capacitance between a rigid bottom plate and an elastic top membrane in seven locations. As additional sample material (soil and/or small rocks) is placed on the top membrane, the deformation of the membrane increases the capacitance. The mass estimation algorithm addresses both the calibration of each SVS channel, and also addresses how to combine the capacitances read from each of the seven channels into a single mass estimate. The probabilistic approach combines the channels according to the variance observed during the training phase, and provides not only the mass estimate, but also a value for the certainty of the estimate. SVS capacitance data is collected for known masses under a wide variety of possible loading scenarios, though in all cases, the distribution of sample within the canister is expected to be approximately uniform. A capacitance-vs-mass curve is fitted to this data, and is subsequently used to determine the mass estimate for the single channel s capacitance reading during the measurement phase. This results in seven different mass estimates, one for each SVS channel. Moreover, the variance of the calibration data is used to place a Gaussian probability distribution function (pdf) around this mass estimate. To blend these seven estimates, the seven pdfs are combined into a single Gaussian distribution function, providing the final mean and variance of the estimate. This blending technique essentially takes the final estimate as an average of the estimates of the seven channels, weighted by the inverse of the channel s variance.
Han, Chang S; Dingemanse, Niels J
2017-10-11
Empirical studies imply that sex-specific genetic architectures can resolve evolutionary conflicts between males and females, and thereby facilitate the evolution of sexual dimorphism. Sex-specificity of behavioural genetic architectures has, however, rarely been considered. Moreover, as the expression of genetic (co)variances is often environment-dependent, general inferences on sex-specific genetic architectures require estimates of quantitative genetics parameters under multiple conditions. We measured exploration and aggression in pedigreed populations of southern field crickets ( Gryllus bimaculatus ) raised on either naturally balanced (free-choice) or imbalanced (protein-deprived) diets. For each dietary condition, we measured for each behavioural trait (i) level of sexual dimorphism, (ii) level of sex-specificity of survival selection gradients, (iii) level of sex-specificity of additive genetic variance, and (iv) strength of the cross-sex genetic correlation. We report here evidence for sexual dimorphism in behaviour as well as sex-specificity in the expression of genetic (co)variances as predicted by theory. The additive genetic variances of exploration and aggression were significantly greater in males compared with females. Cross-sex genetic correlations were highly positive for exploration but deviating (significantly) from one for aggression; findings were consistent across dietary treatments. This suggests that genetic architectures characterize the sexually dimorphic focal behaviours across various key environmental conditions in the wild. Our finding also highlights that sexual conflict can be resolved by evolving sexually independent genetic architectures. © 2017 The Author(s).
NASA Technical Reports Server (NTRS)
Daigle, Matthew John; Goebel, Kai Frank
2010-01-01
Model-based prognostics captures system knowledge in the form of physics-based models of components, and how they fail, in order to obtain accurate predictions of end of life (EOL). EOL is predicted based on the estimated current state distribution of a component and expected profiles of future usage. In general, this requires simulations of the component using the underlying models. In this paper, we develop a simulation-based prediction methodology that achieves computational efficiency by performing only the minimal number of simulations needed in order to accurately approximate the mean and variance of the complete EOL distribution. This is performed through the use of the unscented transform, which predicts the means and covariances of a distribution passed through a nonlinear transformation. In this case, the EOL simulation acts as that nonlinear transformation. In this paper, we review the unscented transform, and describe how this concept is applied to efficient EOL prediction. As a case study, we develop a physics-based model of a solenoid valve, and perform simulation experiments to demonstrate improved computational efficiency without sacrificing prediction accuracy.
Mullan, Barbara; Wong, Cara; Kothe, Emily
2013-03-01
The aim of this study was to investigate whether the theory of planned behaviour (TPB) with the addition of risk awareness could predict breakfast consumption in a sample of adolescents from the UK and Australia. It was hypothesised that the TPB variables of attitudes, subjective norm and perceived behavioural control (PBC) would significantly predict intentions, and that inclusion of risk perception would increase the proportion of variance explained. Secondly it was hypothesised that intention and PBC would predict behaviour. Participants were recruited from secondary schools in Australia and the UK. A total of 613 participants completed the study (448 females, 165 males; mean=14years ±1.1). The TPB predicted 42.2% of the variance in intentions to eat breakfast. All variables significantly predicted intention with PBC as the strongest component. The addition of risk made a small but significant contribution to the prediction of intention. Together intention and PBC predicted 57.8% of the variance in breakfast consumption. Copyright © 2012 Elsevier Ltd. All rights reserved.
Characterizing nonconstant instrumental variance in emerging miniaturized analytical techniques.
Noblitt, Scott D; Berg, Kathleen E; Cate, David M; Henry, Charles S
2016-04-07
Measurement variance is a crucial aspect of quantitative chemical analysis. Variance directly affects important analytical figures of merit, including detection limit, quantitation limit, and confidence intervals. Most reported analyses for emerging analytical techniques implicitly assume constant variance (homoskedasticity) by using unweighted regression calibrations. Despite the assumption of constant variance, it is known that most instruments exhibit heteroskedasticity, where variance changes with signal intensity. Ignoring nonconstant variance results in suboptimal calibrations, invalid uncertainty estimates, and incorrect detection limits. Three techniques where homoskedasticity is often assumed were covered in this work to evaluate if heteroskedasticity had a significant quantitative impact-naked-eye, distance-based detection using paper-based analytical devices (PADs), cathodic stripping voltammetry (CSV) with disposable carbon-ink electrode devices, and microchip electrophoresis (MCE) with conductivity detection. Despite these techniques representing a wide range of chemistries and precision, heteroskedastic behavior was confirmed for each. The general variance forms were analyzed, and recommendations for accounting for nonconstant variance discussed. Monte Carlo simulations of instrument responses were performed to quantify the benefits of weighted regression, and the sensitivity to uncertainty in the variance function was tested. Results show that heteroskedasticity should be considered during development of new techniques; even moderate uncertainty (30%) in the variance function still results in weighted regression outperforming unweighted regressions. We recommend utilizing the power model of variance because it is easy to apply, requires little additional experimentation, and produces higher-precision results and more reliable uncertainty estimates than assuming homoskedasticity. Copyright © 2016 Elsevier B.V. All rights reserved.
Situation awareness measures for simulated submarine track management.
Loft, Shayne; Bowden, Vanessa; Braithwaite, Janelle; Morrell, Daniel B; Huf, Samuel; Durso, Francis T
2015-03-01
The aim of this study was to examine whether the Situation Present Assessment Method (SPAM) and the Situation Awareness Global Assessment Technique (SAGAT) predict incremental variance in performance on a simulated submarine track management task and to measure the potential disruptive effect of these situation awareness (SA) measures. Submarine track managers use various displays to localize and track contacts detected by own-ship sensors. The measurement of SA is crucial for designing effective submarine display interfaces and training programs. Participants monitored a tactical display and sonar bearing-history display to track the cumulative behaviors of contacts in relationship to own-ship position and landmarks. SPAM (or SAGAT) and the Air Traffic Workload Input Technique (ATWIT) were administered during each scenario, and the NASA Task Load Index (NASA-TLX) and Situation Awareness Rating Technique were administered postscenario. SPAM and SAGAT predicted variance in performance after controlling for subjective measures of SA and workload, and SA for past information was a stronger predictor than SA for current/future information. The NASA-TLX predicted performance on some tasks. Only SAGAT predicted variance in performance on all three tasks but marginally increased subjective workload. SPAM, SAGAT, and the NASA-TLX can predict unique variance in submarine track management performance. SAGAT marginally increased subjective workload, but this increase did not lead to any performance decrement. Defense researchers have identified SPAM as an alternative to SAGAT because it would not require field exercises involving submarines to be paused. SPAM was not disruptive, but it is potentially problematic that SPAM did not predict variance in all three performance tasks. © 2014, Human Factors and Ergonomics Society.
Transforming RNA-Seq data to improve the performance of prognostic gene signatures.
Zwiener, Isabella; Frisch, Barbara; Binder, Harald
2014-01-01
Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.
Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
Zwiener, Isabella; Frisch, Barbara; Binder, Harald
2014-01-01
Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques. PMID:24416353
Tiezzi, Francesco; Maltecca, Christian
2015-04-02
Genomic BLUP (GBLUP) can predict breeding values for non-phenotyped individuals based on the identity-by-state genomic relationship matrix (G). The G matrix can be constructed from thousands of markers spread across the genome. The strongest assumption of G and consequently of GBLUP is that all markers contribute equally to the genetic variance of a trait. This assumption is violated for traits that are controlled by a small number of quantitative trait loci (QTL) or individual QTL with large effects. In this paper, we investigate the performance of using a weighted genomic relationship matrix (wG) that takes into consideration the genetic architecture of the trait in order to improve predictive ability for a wide range of traits. Multiple methods were used to calculate weights for several economically relevant traits in US Holstein dairy cattle. Predictive performance was tested by k-means cross-validation. Relaxing the GBLUP assumption of equal marker contribution by increasing the weight that is given to a specific marker in the construction of the trait-specific G resulted in increased predictive performance. The increase was strongest for traits that are controlled by a small number of QTL (e.g. fat and protein percentage). Furthermore, bias in prediction estimates was reduced compared to that resulting from the use of regular G. Even for traits with low heritability and lower general predictive performance (e.g. calving ease traits), weighted G still yielded a gain in accuracy. Genomic relationship matrices weighted by marker realized variance yielded more accurate and less biased predictions for traits regulated by few QTL. Genome-wide association analyses were used to derive marker weights for creating weighted genomic relationship matrices. However, this can be cumbersome and prone to low stability over generations because of erosion of linkage disequilibrium between markers and QTL. Future studies may include other sources of information, such as functional annotation and gene networks, to better exploit the genetic architecture of traits and produce more stable predictions.
Impact of the Fano Factor on Position and Energy Estimation in Scintillation Detectors.
Bora, Vaibhav; Barrett, Harrison H; Jha, Abhinav K; Clarkson, Eric
2015-02-01
The Fano factor for an integer-valued random variable is defined as the ratio of its variance to its mean. Light from various scintillation crystals have been reported to have Fano factors from sub-Poisson (Fano factor < 1) to super-Poisson (Fano factor > 1). For a given mean, a smaller Fano factor implies a smaller variance and thus less noise. We investigated if lower noise in the scintillation light will result in better spatial and energy resolutions. The impact of Fano factor on the estimation of position of interaction and energy deposited in simple gamma-camera geometries is estimated by two methods - calculating the Cramér-Rao bound and estimating the variance of a maximum likelihood estimator. The methods are consistent with each other and indicate that when estimating the position of interaction and energy deposited by a gamma-ray photon, the Fano factor of a scintillator does not affect the spatial resolution. A smaller Fano factor results in a better energy resolution.
Rubio-Aparicio, María; Sánchez-Meca, Julio; López-López, José Antonio; Botella, Juan; Marín-Martínez, Fulgencio
2017-11-01
Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta-analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between-studies variance on the statistical performance of the Q B (P) and Q B (S) tests for subgroup analyses assuming a mixed-effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between-studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between-studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates. © 2017 The British Psychological Society.
Scaling impacts on environmental controls and spatial heterogeneity of soil organic carbon stocks
NASA Astrophysics Data System (ADS)
Mishra, U.; Riley, W. J.
2015-01-01
The spatial heterogeneity of land surfaces affects energy, moisture, and greenhouse gas exchanges with the atmosphere. However, representing heterogeneity of terrestrial hydrological and biogeochemical processes in earth system models (ESMs) remains a critical scientific challenge. We report the impact of spatial scaling on environmental controls, spatial structure, and statistical properties of soil organic carbon (SOC) stocks across the US state of Alaska. We used soil profile observations and environmental factors such as topography, climate, land cover types, and surficial geology to predict the SOC stocks at a 50 m spatial scale. These spatially heterogeneous estimates provide a dataset with reasonable fidelity to the observations at a sufficiently high resolution to examine the environmental controls on the spatial structure of SOC stocks. We upscaled both the predicted SOC stocks and environmental variables from finer to coarser spatial scales (s = 100, 200, 500 m, 1, 2, 5, 10 km) and generated various statistical properties of SOC stock estimates. We found different environmental factors to be statistically significant predictors at different spatial scales. Only elevation, temperature, potential evapotranspiration, and scrub land cover types were significant predictors at all scales. The strengths of control (the median value of geographically weighted regression coefficients) of these four environmental variables on SOC stocks decreased with increasing scale and were accurately represented using mathematical functions (R2 = 0.83-0.97). The spatial structure of SOC stocks across Alaska changed with spatial scale. Although the variance (sill) and unstructured variability (nugget) of the calculated variograms of SOC stocks decreased exponentially with scale, the correlation length (range) remained relatively constant across scale. The variance of predicted SOC stocks decreased with spatial scale over the range of 50 to ~ 500 m, and remained constant beyond this scale. The fitted exponential function accounted for 98% of variability in the variance of SOC stocks. We found moderately-accurate linear relationships between mean and higher-order moments of predicted SOC stocks (R2 ~ 0.55-0.63). Current ESMs operate at coarse spatial scales (50-100 km), and are therefore unable to represent environmental controllers and spatial heterogeneity of high-latitude SOC stocks consistent with observations. We conclude that improved understanding of the scaling behavior of environmental controls and statistical properties of SOC stocks can improve ESM land model benchmarking and perhaps allow representation of spatial heterogeneity of biogeochemistry at scales finer than those currently resolved by ESMs.
Scaling impacts on environmental controls and spatial heterogeneity of soil organic carbon stocks
NASA Astrophysics Data System (ADS)
Mishra, U.; Riley, W. J.
2015-07-01
The spatial heterogeneity of land surfaces affects energy, moisture, and greenhouse gas exchanges with the atmosphere. However, representing the heterogeneity of terrestrial hydrological and biogeochemical processes in Earth system models (ESMs) remains a critical scientific challenge. We report the impact of spatial scaling on environmental controls, spatial structure, and statistical properties of soil organic carbon (SOC) stocks across the US state of Alaska. We used soil profile observations and environmental factors such as topography, climate, land cover types, and surficial geology to predict the SOC stocks at a 50 m spatial scale. These spatially heterogeneous estimates provide a data set with reasonable fidelity to the observations at a sufficiently high resolution to examine the environmental controls on the spatial structure of SOC stocks. We upscaled both the predicted SOC stocks and environmental variables from finer to coarser spatial scales (s = 100, 200, and 500 m and 1, 2, 5, and 10 km) and generated various statistical properties of SOC stock estimates. We found different environmental factors to be statistically significant predictors at different spatial scales. Only elevation, temperature, potential evapotranspiration, and scrub land cover types were significant predictors at all scales. The strengths of control (the median value of geographically weighted regression coefficients) of these four environmental variables on SOC stocks decreased with increasing scale and were accurately represented using mathematical functions (R2 = 0.83-0.97). The spatial structure of SOC stocks across Alaska changed with spatial scale. Although the variance (sill) and unstructured variability (nugget) of the calculated variograms of SOC stocks decreased exponentially with scale, the correlation length (range) remained relatively constant across scale. The variance of predicted SOC stocks decreased with spatial scale over the range of 50 m to ~ 500 m, and remained constant beyond this scale. The fitted exponential function accounted for 98 % of variability in the variance of SOC stocks. We found moderately accurate linear relationships between mean and higher-order moments of predicted SOC stocks (R2 ∼ 0.55-0.63). Current ESMs operate at coarse spatial scales (50-100 km), and are therefore unable to represent environmental controllers and spatial heterogeneity of high-latitude SOC stocks consistent with observations. We conclude that improved understanding of the scaling behavior of environmental controls and statistical properties of SOC stocks could improve ESM land model benchmarking and perhaps allow representation of spatial heterogeneity of biogeochemistry at scales finer than those currently resolved by ESMs.
Scaling impacts on environmental controls and spatial heterogeneity of soil organic carbon stocks
Mishra, U.; Riley, W. J.
2015-07-02
The spatial heterogeneity of land surfaces affects energy, moisture, and greenhouse gas exchanges with the atmosphere. However, representing the heterogeneity of terrestrial hydrological and biogeochemical processes in Earth system models (ESMs) remains a critical scientific challenge. We report the impact of spatial scaling on environmental controls, spatial structure, and statistical properties of soil organic carbon (SOC) stocks across the US state of Alaska. We used soil profile observations and environmental factors such as topography, climate, land cover types, and surficial geology to predict the SOC stocks at a 50 m spatial scale. These spatially heterogeneous estimates provide a data setmore » with reasonable fidelity to the observations at a sufficiently high resolution to examine the environmental controls on the spatial structure of SOC stocks. We upscaled both the predicted SOC stocks and environmental variables from finer to coarser spatial scales ( s = 100, 200, and 500 m and 1, 2, 5, and 10 km) and generated various statistical properties of SOC stock estimates. We found different environmental factors to be statistically significant predictors at different spatial scales. Only elevation, temperature, potential evapotranspiration, and scrub land cover types were significant predictors at all scales. The strengths of control (the median value of geographically weighted regression coefficients) of these four environmental variables on SOC stocks decreased with increasing scale and were accurately represented using mathematical functions ( R 2 = 0.83–0.97). The spatial structure of SOC stocks across Alaska changed with spatial scale. Although the variance (sill) and unstructured variability (nugget) of the calculated variograms of SOC stocks decreased exponentially with scale, the correlation length (range) remained relatively constant across scale. The variance of predicted SOC stocks decreased with spatial scale over the range of 50 m to ~ 500 m, and remained constant beyond this scale. The fitted exponential function accounted for 98 % of variability in the variance of SOC stocks. We found moderately accurate linear relationships between mean and higher-order moments of predicted SOC stocks ( R 2 ∼ 0.55–0.63). Current ESMs operate at coarse spatial scales (50–100 km), and are therefore unable to represent environmental controllers and spatial heterogeneity of high-latitude SOC stocks consistent with observations. We conclude that improved understanding of the scaling behavior of environmental controls and statistical properties of SOC stocks could improve ESM land model benchmarking and perhaps allow representation of spatial heterogeneity of biogeochemistry at scales finer than those currently resolved by ESMs.« less
Scaling impacts on environmental controls and spatial heterogeneity of soil organic carbon stocks
Mishra, U.; Riley, W. J.
2015-01-01
The spatial heterogeneity of land surfaces affects energy, moisture, and greenhouse gas exchanges with the atmosphere. However, representing heterogeneity of terrestrial hydrological and biogeochemical processes in earth system models (ESMs) remains a critical scientific challenge. We report the impact of spatial scaling on environmental controls, spatial structure, and statistical properties of soil organic carbon (SOC) stocks across the US state of Alaska. We used soil profile observations and environmental factors such as topography, climate, land cover types, and surficial geology to predict the SOC stocks at a 50 m spatial scale. These spatially heterogeneous estimates provide a dataset with reasonablemore » fidelity to the observations at a sufficiently high resolution to examine the environmental controls on the spatial structure of SOC stocks. We upscaled both the predicted SOC stocks and environmental variables from finer to coarser spatial scales ( s = 100, 200, 500 m, 1, 2, 5, 10 km) and generated various statistical properties of SOC stock estimates. We found different environmental factors to be statistically significant predictors at different spatial scales. Only elevation, temperature, potential evapotranspiration, and scrub land cover types were significant predictors at all scales. The strengths of control (the median value of geographically weighted regression coefficients) of these four environmental variables on SOC stocks decreased with increasing scale and were accurately represented using mathematical functions ( R 2 = 0.83–0.97). The spatial structure of SOC stocks across Alaska changed with spatial scale. Although the variance (sill) and unstructured variability (nugget) of the calculated variograms of SOC stocks decreased exponentially with scale, the correlation length (range) remained relatively constant across scale. The variance of predicted SOC stocks decreased with spatial scale over the range of 50 to ~ 500 m, and remained constant beyond this scale. The fitted exponential function accounted for 98% of variability in the variance of SOC stocks. We found moderately-accurate linear relationships between mean and higher-order moments of predicted SOC stocks ( R 2 ~ 0.55–0.63). Current ESMs operate at coarse spatial scales (50–100 km), and are therefore unable to represent environmental controllers and spatial heterogeneity of high-latitude SOC stocks consistent with observations. We conclude that improved understanding of the scaling behavior of environmental controls and statistical properties of SOC stocks can improve ESM land model benchmarking and perhaps allow representation of spatial heterogeneity of biogeochemistry at scales finer than those currently resolved by ESMs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mishra, U.; Riley, W. J.
The spatial heterogeneity of land surfaces affects energy, moisture, and greenhouse gas exchanges with the atmosphere. However, representing the heterogeneity of terrestrial hydrological and biogeochemical processes in Earth system models (ESMs) remains a critical scientific challenge. We report the impact of spatial scaling on environmental controls, spatial structure, and statistical properties of soil organic carbon (SOC) stocks across the US state of Alaska. We used soil profile observations and environmental factors such as topography, climate, land cover types, and surficial geology to predict the SOC stocks at a 50 m spatial scale. These spatially heterogeneous estimates provide a data setmore » with reasonable fidelity to the observations at a sufficiently high resolution to examine the environmental controls on the spatial structure of SOC stocks. We upscaled both the predicted SOC stocks and environmental variables from finer to coarser spatial scales ( s = 100, 200, and 500 m and 1, 2, 5, and 10 km) and generated various statistical properties of SOC stock estimates. We found different environmental factors to be statistically significant predictors at different spatial scales. Only elevation, temperature, potential evapotranspiration, and scrub land cover types were significant predictors at all scales. The strengths of control (the median value of geographically weighted regression coefficients) of these four environmental variables on SOC stocks decreased with increasing scale and were accurately represented using mathematical functions ( R 2 = 0.83–0.97). The spatial structure of SOC stocks across Alaska changed with spatial scale. Although the variance (sill) and unstructured variability (nugget) of the calculated variograms of SOC stocks decreased exponentially with scale, the correlation length (range) remained relatively constant across scale. The variance of predicted SOC stocks decreased with spatial scale over the range of 50 m to ~ 500 m, and remained constant beyond this scale. The fitted exponential function accounted for 98 % of variability in the variance of SOC stocks. We found moderately accurate linear relationships between mean and higher-order moments of predicted SOC stocks ( R 2 ∼ 0.55–0.63). Current ESMs operate at coarse spatial scales (50–100 km), and are therefore unable to represent environmental controllers and spatial heterogeneity of high-latitude SOC stocks consistent with observations. We conclude that improved understanding of the scaling behavior of environmental controls and statistical properties of SOC stocks could improve ESM land model benchmarking and perhaps allow representation of spatial heterogeneity of biogeochemistry at scales finer than those currently resolved by ESMs.« less
Estimating Variances of Horizontal Wind Fluctuations in Stable Conditions
NASA Astrophysics Data System (ADS)
Luhar, Ashok K.
2010-05-01
Information concerning the average wind speed and the variances of lateral and longitudinal wind velocity fluctuations is required by dispersion models to characterise turbulence in the atmospheric boundary layer. When the winds are weak, the scalar average wind speed and the vector average wind speed need to be clearly distinguished and both lateral and longitudinal wind velocity fluctuations assume equal importance in dispersion calculations. We examine commonly-used methods of estimating these variances from wind-speed and wind-direction statistics measured separately, for example, by a cup anemometer and a wind vane, and evaluate the implied relationship between the scalar and vector wind speeds, using measurements taken under low-wind stable conditions. We highlight several inconsistencies inherent in the existing formulations and show that the widely-used assumption that the lateral velocity variance is equal to the longitudinal velocity variance is not necessarily true. We derive improved relations for the two variances, and although data under stable stratification are considered for comparison, our analysis is applicable more generally.
Predicting the Cost per Flying Hour for the F-16 Using Programmatic and Operational Variables
2005-06-01
constant variance assumption is accomplished using the Breusch - Pagan test . This is accomplished and the results are listed in Table 12. Figures 19...and 20 follow and add to the discussion by plotting the residuals by predicted for both models. 52 Table 12: Breusch - Pagan Constant Variance Test ...Model A 13844455 6.97E+11 5 74 9.96 0.0764 Model B 74954796 8.69E+12 5 151 17.63 0.00344 Breusch - Pagan Test for Constant Variance -1000 -500 0 500
Sanjak, Jaleal S.; Long, Anthony D.; Thornton, Kevin R.
2017-01-01
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. PMID:28103232
Audio-visual speech cue combination.
Arnold, Derek H; Tear, Morgan; Schindel, Ryan; Roseboom, Warrick
2010-04-16
Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.