sampling error estimates: Topics by Science.gov

Sample records for sampling error estimates

Physical Validation of TRMM TMI and PR Monthly Rain Products Over Oklahoma

NASA Technical Reports Server (NTRS)

Fisher, Brad L.

2004-01-01

The Tropical Rainfall Measuring Mission (TRMM) provides monthly rainfall estimates using data collected by the TRMM satellite. These estimates cover a substantial fraction of the earth's surface. The physical validation of TRMM estimates involves corroborating the accuracy of spaceborne estimates of areal rainfall by inferring errors and biases from ground-based rain estimates. The TRMM error budget consists of two major sources of error: retrieval and sampling. Sampling errors are intrinsic to the process of estimating monthly rainfall and occur because the satellite extrapolates monthly rainfall from a small subset of measurements collected only during satellite overpasses. Retrieval errors, on the other hand, are related to the process of collecting measurements while the satellite is overhead. One of the big challenges confronting the TRMM validation effort is how to best estimate these two main components of the TRMM error budget, which are not easily decoupled. This four-year study computed bulk sampling and retrieval errors for the TRMM microwave imager (TMI) and the precipitation radar (PR) by applying a technique that sub-samples gauge data at TRMM overpass times. Gridded monthly rain estimates are then computed from the monthly bulk statistics of the collected samples, providing a sensor-dependent gauge rain estimate that is assumed to include a TRMM equivalent sampling error. The sub-sampled gauge rain estimates are then used in conjunction with the monthly satellite and gauge (without sub- sampling) estimates to decouple retrieval and sampling errors. The computed mean sampling errors for the TMI and PR were 5.9% and 7.796, respectively, in good agreement with theoretical predictions. The PR year-to-year retrieval biases exceeded corresponding TMI biases, but it was found that these differences were partially due to negative TMI biases during cold months and positive TMI biases during warm months.
Satellite Sampling and Retrieval Errors in Regional Monthly Rain Estimates from TMI AMSR-E, SSM/I, AMSU-B and the TRMM PR

NASA Technical Reports Server (NTRS)

Fisher, Brad; Wolff, David B.

2010-01-01

Passive and active microwave rain sensors onboard earth-orbiting satellites estimate monthly rainfall from the instantaneous rain statistics collected during satellite overpasses. It is well known that climate-scale rain estimates from meteorological satellites incur sampling errors resulting from the process of discrete temporal sampling and statistical averaging. Sampling and retrieval errors ultimately become entangled in the estimation of the mean monthly rain rate. The sampling component of the error budget effectively introduces statistical noise into climate-scale rain estimates that obscure the error component associated with the instantaneous rain retrieval. Estimating the accuracy of the retrievals on monthly scales therefore necessitates a decomposition of the total error budget into sampling and retrieval error quantities. This paper presents results from a statistical evaluation of the sampling and retrieval errors for five different space-borne rain sensors on board nine orbiting satellites. Using an error decomposition methodology developed by one of the authors, sampling and retrieval errors were estimated at 0.25 resolution within 150 km of ground-based weather radars located at Kwajalein, Marshall Islands and Melbourne, Florida. Error and bias statistics were calculated according to the land, ocean and coast classifications of the surface terrain mask developed for the Goddard Profiling (GPROF) rain algorithm. Variations in the comparative error statistics are attributed to various factors related to differences in the swath geometry of each rain sensor, the orbital and instrument characteristics of the satellite and the regional climatology. The most significant result from this study found that each of the satellites incurred negative longterm oceanic retrieval biases of 10 to 30%.
Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

NASA Technical Reports Server (NTRS)

Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

2000-01-01

Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.
Moments and Root-Mean-Square Error of the Bayesian MMSE Estimator of Classification Error in the Gaussian Model.

PubMed

Zollanvari, Amin; Dougherty, Edward R

2014-06-01

The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.
Adaptive control of theophylline therapy: importance of blood sampling times.

PubMed

D'Argenio, D Z; Khakmahd, K

1983-10-01

A two-observation protocol for estimating theophylline clearance during a constant-rate intravenous infusion is used to examine the importance of blood sampling schedules with regard to the information content of resulting concentration data. Guided by a theory for calculating maximally informative sample times, population simulations are used to assess the effect of specific sampling times on the precision of resulting clearance estimates and subsequent predictions of theophylline plasma concentrations. The simulations incorporated noise terms for intersubject variability, dosing errors, sample collection errors, and assay error. Clearance was estimated using Chiou's method, least squares, and a Bayesian estimation procedure. The results of these simulations suggest that clinically significant estimation and prediction errors may result when using the above two-point protocol for estimating theophylline clearance if the time separating the two blood samples is less than one population mean elimination half-life.
Using the Sampling Margin of Error to Assess the Interpretative Validity of Student Evaluations of Teaching

ERIC Educational Resources Information Center

James, David E.; Schraw, Gregory; Kuch, Fred

2015-01-01

We present an equation, derived from standard statistical theory, that can be used to estimate sampling margin of error for student evaluations of teaching (SETs). We use the equation to examine the effect of sample size, response rates and sample variability on the estimated sampling margin of error, and present results in four tables that allow…
Measuring coverage in MNCH: total survey error and the interpretation of intervention coverage estimates from household surveys.

PubMed

Eisele, Thomas P; Rhoda, Dale A; Cutts, Felicity T; Keating, Joseph; Ren, Ruilin; Barros, Aluisio J D; Arnold, Fred

2013-01-01

Nationally representative household surveys are increasingly relied upon to measure maternal, newborn, and child health (MNCH) intervention coverage at the population level in low- and middle-income countries. Surveys are the best tool we have for this purpose and are central to national and global decision making. However, all survey point estimates have a certain level of error (total survey error) comprising sampling and non-sampling error, both of which must be considered when interpreting survey results for decision making. In this review, we discuss the importance of considering these errors when interpreting MNCH intervention coverage estimates derived from household surveys, using relevant examples from national surveys to provide context. Sampling error is usually thought of as the precision of a point estimate and is represented by 95% confidence intervals, which are measurable. Confidence intervals can inform judgments about whether estimated parameters are likely to be different from the real value of a parameter. We recommend, therefore, that confidence intervals for key coverage indicators should always be provided in survey reports. By contrast, the direction and magnitude of non-sampling error is almost always unmeasurable, and therefore unknown. Information error and bias are the most common sources of non-sampling error in household survey estimates and we recommend that they should always be carefully considered when interpreting MNCH intervention coverage based on survey data. Overall, we recommend that future research on measuring MNCH intervention coverage should focus on refining and improving survey-based coverage estimates to develop a better understanding of how results should be interpreted and used.
Measuring Coverage in MNCH: Total Survey Error and the Interpretation of Intervention Coverage Estimates from Household Surveys

PubMed Central

Eisele, Thomas P.; Rhoda, Dale A.; Cutts, Felicity T.; Keating, Joseph; Ren, Ruilin; Barros, Aluisio J. D.; Arnold, Fred

2013-01-01

Nationally representative household surveys are increasingly relied upon to measure maternal, newborn, and child health (MNCH) intervention coverage at the population level in low- and middle-income countries. Surveys are the best tool we have for this purpose and are central to national and global decision making. However, all survey point estimates have a certain level of error (total survey error) comprising sampling and non-sampling error, both of which must be considered when interpreting survey results for decision making. In this review, we discuss the importance of considering these errors when interpreting MNCH intervention coverage estimates derived from household surveys, using relevant examples from national surveys to provide context. Sampling error is usually thought of as the precision of a point estimate and is represented by 95% confidence intervals, which are measurable. Confidence intervals can inform judgments about whether estimated parameters are likely to be different from the real value of a parameter. We recommend, therefore, that confidence intervals for key coverage indicators should always be provided in survey reports. By contrast, the direction and magnitude of non-sampling error is almost always unmeasurable, and therefore unknown. Information error and bias are the most common sources of non-sampling error in household survey estimates and we recommend that they should always be carefully considered when interpreting MNCH intervention coverage based on survey data. Overall, we recommend that future research on measuring MNCH intervention coverage should focus on refining and improving survey-based coverage estimates to develop a better understanding of how results should be interpreted and used. PMID:23667331
Comparing interval estimates for small sample ordinal CFA models

PubMed Central

Natesan, Prathiba

2015-01-01

Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research. PMID:26579002
Comparing interval estimates for small sample ordinal CFA models.

PubMed

Natesan, Prathiba

2015-01-01

Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research.
Sampling Errors of SSM/I and TRMM Rainfall Averages: Comparison with Error Estimates from Surface Data and a Sample Model

NASA Technical Reports Server (NTRS)

Bell, Thomas L.; Kundu, Prasun K.; Kummerow, Christian D.; Einaudi, Franco (Technical Monitor)

2000-01-01

Quantitative use of satellite-derived maps of monthly rainfall requires some measure of the accuracy of the satellite estimates. The rainfall estimate for a given map grid box is subject to both remote-sensing error and, in the case of low-orbiting satellites, sampling error due to the limited number of observations of the grid box provided by the satellite. A simple model of rain behavior predicts that Root-mean-square (RMS) random error in grid-box averages should depend in a simple way on the local average rain rate, and the predicted behavior has been seen in simulations using surface rain-gauge and radar data. This relationship was examined using satellite SSM/I data obtained over the western equatorial Pacific during TOGA COARE. RMS error inferred directly from SSM/I rainfall estimates was found to be larger than predicted from surface data, and to depend less on local rain rate than was predicted. Preliminary examination of TRMM microwave estimates shows better agreement with surface data. A simple method of estimating rms error in satellite rainfall estimates is suggested, based on quantities that can be directly computed from the satellite data.
Mass load estimation errors utilizing grab sampling strategies in a karst watershed

USGS Publications Warehouse

Fogle, A.W.; Taraba, J.L.; Dinger, J.S.

2003-01-01

Developing a mass load estimation method appropriate for a given stream and constituent is difficult due to inconsistencies in hydrologic and constituent characteristics. The difficulty may be increased in flashy flow conditions such as karst. Many projects undertaken are constrained by budget and manpower and do not have the luxury of sophisticated sampling strategies. The objectives of this study were to: (1) examine two grab sampling strategies with varying sampling intervals and determine the error in mass load estimates, and (2) determine the error that can be expected when a grab sample is collected at a time of day when the diurnal variation is most divergent from the daily mean. Results show grab sampling with continuous flow to be a viable data collection method for estimating mass load in the study watershed. Comparing weekly, biweekly, and monthly grab sampling, monthly sampling produces the best results with this method. However, the time of day the sample is collected is important. Failure to account for diurnal variability when collecting a grab sample may produce unacceptable error in mass load estimates. The best time to collect a sample is when the diurnal cycle is nearest the daily mean.
A comparison of two estimates of standard error for a ratio-of-means estimator for a mapped-plot sample design in southeast Alaska.

Treesearch

Willem W.S. van Hees

2002-01-01

Comparisons of estimated standard error for a ratio-of-means (ROM) estimator are presented for forest resource inventories conducted in southeast Alaska between 1995 and 2000. Estimated standard errors for the ROM were generated by using a traditional variance estimator and also approximated by bootstrap methods. Estimates of standard error generated by both...
Precipitation and Latent Heating Distributions from Satellite Passive Microwave Radiometry. Part 1; Improved Method and Uncertainties

NASA Technical Reports Server (NTRS)

Olson, William S.; Kummerow, Christian D.; Yang, Song; Petty, Grant W.; Tao, Wei-Kuo; Bell, Thomas L.; Braun, Scott A.; Wang, Yansen; Lang, Stephen E.; Johnson, Daniel E.;

2006-01-01

A revised Bayesian algorithm for estimating surface rain rate, convective rain proportion, and latent heating profiles from satellite-borne passive microwave radiometer observations over ocean backgrounds is described. The algorithm searches a large database of cloud-radiative model simulations to find cloud profiles that are radiatively consistent with a given set of microwave radiance measurements. The properties of these radiatively consistent profiles are then composited to obtain best estimates of the observed properties. The revised algorithm is supported by an expanded and more physically consistent database of cloud-radiative model simulations. The algorithm also features a better quantification of the convective and nonconvective contributions to total rainfall, a new geographic database, and an improved representation of background radiances in rain-free regions. Bias and random error estimates are derived from applications of the algorithm to synthetic radiance data, based upon a subset of cloud-resolving model simulations, and from the Bayesian formulation itself. Synthetic rain-rate and latent heating estimates exhibit a trend of high (low) bias for low (high) retrieved values. The Bayesian estimates of random error are propagated to represent errors at coarser time and space resolutions, based upon applications of the algorithm to TRMM Microwave Imager (TMI) data. Errors in TMI instantaneous rain-rate estimates at 0.5 -resolution range from approximately 50% at 1 mm/h to 20% at 14 mm/h. Errors in collocated spaceborne radar rain-rate estimates are roughly 50%-80% of the TMI errors at this resolution. The estimated algorithm random error in TMI rain rates at monthly, 2.5deg resolution is relatively small (less than 6% at 5 mm day.1) in comparison with the random error resulting from infrequent satellite temporal sampling (8%-35% at the same rain rate). Percentage errors resulting from sampling decrease with increasing rain rate, and sampling errors in latent heating rates follow the same trend. Averaging over 3 months reduces sampling errors in rain rates to 6%-15% at 5 mm day.1, with proportionate reductions in latent heating sampling errors.

A simulation test of the effectiveness of several methods for error-checking non-invasive genetic data

USGS Publications Warehouse

Roon, David A.; Waits, L.P.; Kendall, K.C.

2005-01-01

Non-invasive genetic sampling (NGS) is becoming a popular tool for population estimation. However, multiple NGS studies have demonstrated that polymerase chain reaction (PCR) genotyping errors can bias demographic estimates. These errors can be detected by comprehensive data filters such as the multiple-tubes approach, but this approach is expensive and time consuming as it requires three to eight PCR replicates per locus. Thus, researchers have attempted to correct PCR errors in NGS datasets using non-comprehensive error checking methods, but these approaches have not been evaluated for reliability. We simulated NGS studies with and without PCR error and 'filtered' datasets using non-comprehensive approaches derived from published studies and calculated mark-recapture estimates using CAPTURE. In the absence of data-filtering, simulated error resulted in serious inflations in CAPTURE estimates; some estimates exceeded N by ??? 200%. When data filters were used, CAPTURE estimate reliability varied with per-locus error (E??). At E?? = 0.01, CAPTURE estimates from filtered data displayed < 5% deviance from error-free estimates. When E?? was 0.05 or 0.09, some CAPTURE estimates from filtered data displayed biases in excess of 10%. Biases were positive at high sampling intensities; negative biases were observed at low sampling intensities. We caution researchers against using non-comprehensive data filters in NGS studies, unless they can achieve baseline per-locus error rates below 0.05 and, ideally, near 0.01. However, we suggest that data filters can be combined with careful technique and thoughtful NGS study design to yield accurate demographic information. ?? 2005 The Zoological Society of London.
Estimation of population mean in the presence of measurement error and non response under stratified random sampling

PubMed Central

Shabbir, Javid

2018-01-01

In the present paper we propose an improved class of estimators in the presence of measurement error and non-response under stratified random sampling for estimating the finite population mean. The theoretical and numerical studies reveal that the proposed class of estimators performs better than other existing estimators. PMID:29401519
Distribution of the two-sample t-test statistic following blinded sample size re-estimation.

PubMed

Lu, Kaifeng

2016-05-01

We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Evaluation of monthly rainfall estimates derived from the special sensor microwave/imager (SSM/I) over the tropical Pacific

NASA Technical Reports Server (NTRS)

Berg, Wesley; Avery, Susan K.

1995-01-01

Estimates of monthly rainfall have been computed over the tropical Pacific using passive microwave satellite observations from the special sensor microwave/imager (SSM/I) for the period from July 1987 through December 1990. These monthly estimates are calibrated using data from a network of Pacific atoll rain gauges in order to account for systematic biases and are then compared with several visible and infrared satellite-based rainfall estimation techniques for the purpose of evaluating the performance of the microwave-based estimates. Although several key differences among the various techniques are observed, the general features of the monthly rainfall time series agree very well. Finally, the significant error sources contributing to uncertainties in the monthly estimates are examined and an estimate of the total error is produced. The sampling error characteristics are investigated using data from two SSM/I sensors and a detailed analysis of the characteristics of the diurnal cycle of rainfall over the oceans and its contribution to sampling errors in the monthly SSM/I estimates is made using geosynchronous satellite data. Based on the analysis of the sampling and other error sources the total error was estimated to be of the order of 30 to 50% of the monthly rainfall for estimates averaged over 2.5 deg x 2.5 deg latitude/longitude boxes, with a contribution due to diurnal variability of the order of 10%.
Sample sizes needed for specified margins of relative error in the estimates of the repeatability and reproducibility standard deviations.

PubMed

McClure, Foster D; Lee, Jung K

2005-01-01

Sample size formulas are developed to estimate the repeatability and reproducibility standard deviations (Sr and S(R)) such that the actual error in (Sr and S(R)) relative to their respective true values, sigmar and sigmaR, are at predefined levels. The statistical consequences associated with AOAC INTERNATIONAL required sample size to validate an analytical method are discussed. In addition, formulas to estimate the uncertainties of (Sr and S(R)) were derived and are provided as supporting documentation. Formula for the Number of Replicates Required for a Specified Margin of Relative Error in the Estimate of the Repeatability Standard Deviation.
Accounting for sampling error when inferring population synchrony from time-series data: a Bayesian state-space modelling approach with applications.

PubMed

Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique

2014-01-01

Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates.

Accounting for Sampling Error When Inferring Population Synchrony from Time-Series Data: A Bayesian State-Space Modelling Approach with Applications

PubMed Central

Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique

2014-01-01

Background Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. Methodology/Principal findings The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. Conclusion/Significance The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates. PMID:24489839
Estimating tree biomass regressions and their error, proceedings of the workshop on tree biomass regression functions and their contribution to the error

Treesearch

Eric H. Wharton; Tiberius Cunia

1987-01-01

Proceedings of a workshop co-sponsored by the USDA Forest Service, the State University of New York, and the Society of American Foresters. Presented were papers on the methodology of sample tree selection, tree biomass measurement, construction of biomass tables and estimation of their error, and combining the error of biomass tables with that of the sample plots or...
Utilizing Adjoint-Based Error Estimates for Surrogate Models to Accurately Predict Probabilities of Events

DOE PAGES

Butler, Troy; Wildey, Timothy

2018-01-01

In thist study, we develop a procedure to utilize error estimates for samples of a surrogate model to compute robust upper and lower bounds on estimates of probabilities of events. We show that these error estimates can also be used in an adaptive algorithm to simultaneously reduce the computational cost and increase the accuracy in estimating probabilities of events using computationally expensive high-fidelity models. Specifically, we introduce the notion of reliability of a sample of a surrogate model, and we prove that utilizing the surrogate model for the reliable samples and the high-fidelity model for the unreliable samples gives preciselymore » the same estimate of the probability of the output event as would be obtained by evaluation of the original model for each sample. The adaptive algorithm uses the additional evaluations of the high-fidelity model for the unreliable samples to locally improve the surrogate model near the limit state, which significantly reduces the number of high-fidelity model evaluations as the limit state is resolved. Numerical results based on a recently developed adjoint-based approach for estimating the error in samples of a surrogate are provided to demonstrate (1) the robustness of the bounds on the probability of an event, and (2) that the adaptive enhancement algorithm provides a more accurate estimate of the probability of the QoI event than standard response surface approximation methods at a lower computational cost.« less
Utilizing Adjoint-Based Error Estimates for Surrogate Models to Accurately Predict Probabilities of Events

DOE Office of Scientific and Technical Information (OSTI.GOV)

Butler, Troy; Wildey, Timothy

In thist study, we develop a procedure to utilize error estimates for samples of a surrogate model to compute robust upper and lower bounds on estimates of probabilities of events. We show that these error estimates can also be used in an adaptive algorithm to simultaneously reduce the computational cost and increase the accuracy in estimating probabilities of events using computationally expensive high-fidelity models. Specifically, we introduce the notion of reliability of a sample of a surrogate model, and we prove that utilizing the surrogate model for the reliable samples and the high-fidelity model for the unreliable samples gives preciselymore » the same estimate of the probability of the output event as would be obtained by evaluation of the original model for each sample. The adaptive algorithm uses the additional evaluations of the high-fidelity model for the unreliable samples to locally improve the surrogate model near the limit state, which significantly reduces the number of high-fidelity model evaluations as the limit state is resolved. Numerical results based on a recently developed adjoint-based approach for estimating the error in samples of a surrogate are provided to demonstrate (1) the robustness of the bounds on the probability of an event, and (2) that the adaptive enhancement algorithm provides a more accurate estimate of the probability of the QoI event than standard response surface approximation methods at a lower computational cost.« less
Small Body GN and C Research Report: G-SAMPLE - An In-Flight Dynamical Method for Identifying Sample Mass [External Release Version

NASA Technical Reports Server (NTRS)

Carson, John M., III; Bayard, David S.

2006-01-01

G-SAMPLE is an in-flight dynamical method for use by sample collection missions to identify the presence and quantity of collected sample material. The G-SAMPLE method implements a maximum-likelihood estimator to identify the collected sample mass, based on onboard force sensor measurements, thruster firings, and a dynamics model of the spacecraft. With G-SAMPLE, sample mass identification becomes a computation rather than an extra hardware requirement; the added cost of cameras or other sensors for sample mass detection is avoided. Realistic simulation examples are provided for a spacecraft configuration with a sample collection device mounted on the end of an extended boom. In one representative example, a 1000 gram sample mass is estimated to within 110 grams (95% confidence) under realistic assumptions of thruster profile error, spacecraft parameter uncertainty, and sensor noise. For convenience to future mission design, an overall sample-mass estimation error budget is developed to approximate the effect of model uncertainty, sensor noise, data rate, and thrust profile error on the expected estimate of collected sample mass.
Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA

NASA Astrophysics Data System (ADS)

Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz

2018-04-01

External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.
Population size estimation in Yellowstone wolves with error-prone noninvasive microsatellite genotypes.

PubMed

Creel, Scott; Spong, Goran; Sands, Jennifer L; Rotella, Jay; Zeigle, Janet; Joe, Lawrence; Murphy, Kerry M; Smith, Douglas

2003-07-01

Determining population sizes can be difficult, but is essential for conservation. By counting distinct microsatellite genotypes, DNA from noninvasive samples (hair, faeces) allows estimation of population size. Problems arise because genotypes from noninvasive samples are error-prone, but genotyping errors can be reduced by multiple polymerase chain reaction (PCR). For faecal genotypes from wolves in Yellowstone National Park, error rates varied substantially among samples, often above the 'worst-case threshold' suggested by simulation. Consequently, a substantial proportion of multilocus genotypes held one or more errors, despite multiple PCR. These genotyping errors created several genotypes per individual and caused overestimation (up to 5.5-fold) of population size. We propose a 'matching approach' to eliminate this overestimation bias.
On-line estimation of error covariance parameters for atmospheric data assimilation

NASA Technical Reports Server (NTRS)

Dee, Dick P.

1995-01-01

A simple scheme is presented for on-line estimation of covariance parameters in statistical data assimilation systems. The scheme is based on a maximum-likelihood approach in which estimates are produced on the basis of a single batch of simultaneous observations. Simple-sample covariance estimation is reasonable as long as the number of available observations exceeds the number of tunable parameters by two or three orders of magnitude. Not much is known at present about model error associated with actual forecast systems. Our scheme can be used to estimate some important statistical model error parameters such as regionally averaged variances or characteristic correlation length scales. The advantage of the single-sample approach is that it does not rely on any assumptions about the temporal behavior of the covariance parameters: time-dependent parameter estimates can be continuously adjusted on the basis of current observations. This is of practical importance since it is likely to be the case that both model error and observation error strongly depend on the actual state of the atmosphere. The single-sample estimation scheme can be incorporated into any four-dimensional statistical data assimilation system that involves explicit calculation of forecast error covariances, including optimal interpolation (OI) and the simplified Kalman filter (SKF). The computational cost of the scheme is high but not prohibitive; on-line estimation of one or two covariance parameters in each analysis box of an operational bozed-OI system is currently feasible. A number of numerical experiments performed with an adaptive SKF and an adaptive version of OI, using a linear two-dimensional shallow-water model and artificially generated model error are described. The performance of the nonadaptive versions of these methods turns out to depend rather strongly on correct specification of model error parameters. These parameters are estimated under a variety of conditions, including uniformly distributed model error and time-dependent model error statistics.
Accounting for nonsampling error in estimates of HIV epidemic trends from antenatal clinic sentinel surveillance

PubMed Central

Eaton, Jeffrey W.; Bao, Le

2017-01-01

Objectives The aim of the study was to propose and demonstrate an approach to allow additional nonsampling uncertainty about HIV prevalence measured at antenatal clinic sentinel surveillance (ANC-SS) in model-based inferences about trends in HIV incidence and prevalence. Design Mathematical model fitted to surveillance data with Bayesian inference. Methods We introduce a variance inflation parameter σinfl2 that accounts for the uncertainty of nonsampling errors in ANC-SS prevalence. It is additive to the sampling error variance. Three approaches are tested for estimating σinfl2 using ANC-SS and household survey data from 40 subnational regions in nine countries in sub-Saharan, as defined in UNAIDS 2016 estimates. Methods were compared using in-sample fit and out-of-sample prediction of ANC-SS data, fit to household survey prevalence data, and the computational implications. Results Introducing the additional variance parameter σinfl2 increased the error variance around ANC-SS prevalence observations by a median of 2.7 times (interquartile range 1.9–3.8). Using only sampling error in ANC-SS prevalence ( σinfl2=0), coverage of 95% prediction intervals was 69% in out-of-sample prediction tests. This increased to 90% after introducing the additional variance parameter σinfl2. The revised probabilistic model improved model fit to household survey prevalence and increased epidemic uncertainty intervals most during the early epidemic period before 2005. Estimating σinfl2 did not increase the computational cost of model fitting. Conclusions: We recommend estimating nonsampling error in ANC-SS as an additional parameter in Bayesian inference using the Estimation and Projection Package model. This approach may prove useful for incorporating other data sources such as routine prevalence from Prevention of mother-to-child transmission testing into future epidemic estimates. PMID:28296801
Dynamic Method for Identifying Collected Sample Mass

NASA Technical Reports Server (NTRS)

Carson, John

2008-01-01

G-Sample is designed for sample collection missions to identify the presence and quantity of sample material gathered by spacecraft equipped with end effectors. The software method uses a maximum-likelihood estimator to identify the collected sample's mass based on onboard force-sensor measurements, thruster firings, and a dynamics model of the spacecraft. This makes sample mass identification a computation rather than a process requiring additional hardware. Simulation examples of G-Sample are provided for spacecraft model configurations with a sample collection device mounted on the end of an extended boom. In the absence of thrust knowledge errors, the results indicate that G-Sample can identify the amount of collected sample mass to within 10 grams (with 95-percent confidence) by using a force sensor with a noise and quantization floor of 50 micrometers. These results hold even in the presence of realistic parametric uncertainty in actual spacecraft inertia, center-of-mass offset, and first flexibility modes. Thrust profile knowledge is shown to be a dominant sensitivity for G-Sample, entering in a nearly one-to-one relationship with the final mass estimation error. This means thrust profiles should be well characterized with onboard accelerometers prior to sample collection. An overall sample-mass estimation error budget has been developed to approximate the effect of model uncertainty, sensor noise, data rate, and thrust profile error on the expected estimate of collected sample mass.
Eigenvector method for umbrella sampling enables error analysis

PubMed Central

Thiede, Erik H.; Van Koten, Brian; Weare, Jonathan; Dinner, Aaron R.

2016-01-01

Umbrella sampling efficiently yields equilibrium averages that depend on exploring rare states of a model by biasing simulations to windows of coordinate values and then combining the resulting data with physical weighting. Here, we introduce a mathematical framework that casts the step of combining the data as an eigenproblem. The advantage to this approach is that it facilitates error analysis. We discuss how the error scales with the number of windows. Then, we derive a central limit theorem for averages that are obtained from umbrella sampling. The central limit theorem suggests an estimator of the error contributions from individual windows, and we develop a simple and computationally inexpensive procedure for implementing it. We demonstrate this estimator for simulations of the alanine dipeptide and show that it emphasizes low free energy pathways between stable states in comparison to existing approaches for assessing error contributions. Our work suggests the possibility of using the estimator and, more generally, the eigenvector method for umbrella sampling to guide adaptation of the simulation parameters to accelerate convergence. PMID:27586912
Unscented predictive variable structure filter for satellite attitude estimation with model errors when using low precision sensors

NASA Astrophysics Data System (ADS)

Cao, Lu; Li, Hengnian

2016-10-01

For the satellite attitude estimation problem, the serious model errors always exist and hider the estimation performance of the Attitude Determination and Control System (ACDS), especially for a small satellite with low precision sensors. To deal with this problem, a new algorithm for the attitude estimation, referred to as the unscented predictive variable structure filter (UPVSF) is presented. This strategy is proposed based on the variable structure control concept and unscented transform (UT) sampling method. It can be implemented in real time with an ability to estimate the model errors on-line, in order to improve the state estimation precision. In addition, the model errors in this filter are not restricted only to the Gaussian noises; therefore, it has the advantages to deal with the various kinds of model errors or noises. It is anticipated that the UT sampling strategy can further enhance the robustness and accuracy of the novel UPVSF. Numerical simulations show that the proposed UPVSF is more effective and robustness in dealing with the model errors and low precision sensors compared with the traditional unscented Kalman filter (UKF).
Decorrelation of the true and estimated classifier errors in high-dimensional settings.

PubMed

Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R

2007-01-01

The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
Optimum nonparametric estimation of population density based on ordered distances

USGS Publications Warehouse

Patil, S.A.; Kovner, J.L.; Burnham, Kenneth P.

1982-01-01

The asymptotic mean and error mean square are determined for the nonparametric estimator of plant density by distance sampling proposed by Patil, Burnham and Kovner (1979, Biometrics 35, 597-604. On the basis of these formulae, a bias-reduced version of this estimator is given, and its specific form is determined which gives minimum mean square error under varying assumptions about the true probability density function of the sampled data. Extension is given to line-transect sampling.
Evaluation of Bayesian Sequential Proportion Estimation Using Analyst Labels

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Abotteen, K. M. (Principal Investigator)

1980-01-01

The author has identified the following significant results. A total of ten Large Area Crop Inventory Experiment Phase 3 blind sites and analyst-interpreter labels were used in a study to compare proportional estimates obtained by the Bayes sequential procedure with estimates obtained from simple random sampling and from Procedure 1. The analyst error rate using the Bayes technique was shown to be no greater than that for the simple random sampling. Also, the segment proportion estimates produced using this technique had smaller bias and mean squared errors than the estimates produced using either simple random sampling or Procedure 1.
78 FR 28597 - State Median Income Estimates for a Four-Person Household: Notice of the Federal Fiscal Year (FFY...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-15

....gov/acs/www/ or contact the Census Bureau's Social, Economic, and Housing Statistics Division at (301...) Sampling Error, which consists of the error that arises from the use of probability sampling to create the... direction; and (2) Sampling Error, which consists of the error that arises from the use of probability...
Nematode Damage Functions: The Problems of Experimental and Sampling Error

PubMed Central

Ferris, H.

1984-01-01

The development and use of pest damage functions involves measurement and experimental errors associated with cultural, environmental, and distributional factors. Damage predictions are more valuable if considered with associated probability. Collapsing population densities into a geometric series of population classes allows a pseudo-replication removal of experimental and sampling error in damage function development. Recognition of the nature of sampling error for aggregated populations allows assessment of probability associated with the population estimate. The product of the probabilities incorporated in the damage function and in the population estimate provides a basis for risk analysis of the yield loss prediction and the ensuing management decision. PMID:19295865
Estimating regression coefficients from clustered samples: Sampling errors and optimum sample allocation

NASA Technical Reports Server (NTRS)

Kalton, G.

1983-01-01

A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.
Estimating age of sea otters with cementum layers in the first premolar

USGS Publications Warehouse

Bodkin, James L.; Ames, J.A.; Jameson, R.J.; Johnson, A.M.; Matson, G.M.

1997-01-01

We assessed sources of variation in the use of tooth cementum layers to determine age by comparing counts in premolar tooth sections to known ages of 20 sea otters (Enhydra lutris). Three readers examined each sample 3 times, and the 3 readings of each sample were averaged by reader to provide the mean estimated age. The mean (SE) of known age sample was 5.2 years (1.0) and the 3 mean estimated ages were 7.0 (1.0), 5.9 (1.1) and, 4.4 (0.8). The proportion of estimates accurate to within +/- 1 year were 0.25, 0.55, and 0.65 and to within +/- 2 years 0.65, 0.80, and 0.70, by reader. The proportions of samples estimated with >3 years error were 0.20, 0.10, and 0.05. Errors as large as 7, 6, and 5 years were made among readers. In few instances did all readers uniformly provide either accurate (error 1 yr) counts. In most cases (0.85), 1 or 2 of the readers provided accurate counts. Coefficients of determination (R2) between known ages and mean estimated ages were 0.81, 0.87, and 0.87, by reader. The results of this study suggest that cementum layers within sea otter premolar teeth likely are deposited annually and can be used for age estimation. However, criteria used in interpreting layers apparently varied by reader, occasionally resulting in large errors, which were not consistent among readers. While large errors were evident for some individual otters, there were no differences between the known and estimated age-class distribution generated by each reader. Until accuracy can be improved, application of this ageing technique should be limited to sample sizes of at least 6-7 individuals within age classes of >/=1 year.
Comparison of structural and least-squares lines for estimating geologic relations

USGS Publications Warehouse

Williams, G.P.; Troutman, B.M.

1990-01-01

Two different goals in fitting straight lines to data are to estimate a "true" linear relation (physical law) and to predict values of the dependent variable with the smallest possible error. Regarding the first goal, a Monte Carlo study indicated that the structural-analysis (SA) method of fitting straight lines to data is superior to the ordinary least-squares (OLS) method for estimating "true" straight-line relations. Number of data points, slope and intercept of the true relation, and variances of the errors associated with the independent (X) and dependent (Y) variables influence the degree of agreement. For example, differences between the two line-fitting methods decrease as error in X becomes small relative to error in Y. Regarding the second goal-predicting the dependent variable-OLS is better than SA. Again, the difference diminishes as X takes on less error relative to Y. With respect to estimation of slope and intercept and prediction of Y, agreement between Monte Carlo results and large-sample theory was very good for sample sizes of 100, and fair to good for sample sizes of 20. The procedures and error measures are illustrated with two geologic examples. ?? 1990 International Association for Mathematical Geology.

Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

PubMed Central

Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

2010-01-01

It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140
Generalized Variance Function Applications in Forestry

Treesearch

James Alegria; Charles T. Scott; Charles T. Scott

1991-01-01

Adequately predicting the sampling errors of tabular data can reduce printing costs by eliminating the need to publish separate sampling error tables. Two generalized variance functions (GVFs) found in the literature and three GVFs derived for this study were evaluated for their ability to predict the sampling error of tabular forestry estimates. The recommended GVFs...
75 FR 26780 - State Median Income Estimate for a Four-Person Family: Notice of the Federal Fiscal Year (FFY...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-12

... Household Economic Statistics Division at (301) 763-3243. Under the advice of the Census Bureau, HHS..., which consists of the error that arises from the use of probability sampling to create the sample. For...) Sampling Error, which consists of the error that arises from the use of probability sampling to create the...
Quantifying errors without random sampling.

PubMed

Phillips, Carl V; LaPole, Luwanna M

2003-06-12

All quantifications of mortality, morbidity, and other health measures involve numerous sources of error. The routine quantification of random sampling error makes it easy to forget that other sources of error can and should be quantified. When a quantification does not involve sampling, error is almost never quantified and results are often reported in ways that dramatically overstate their precision. We argue that the precision implicit in typical reporting is problematic and sketch methods for quantifying the various sources of error, building up from simple examples that can be solved analytically to more complex cases. There are straightforward ways to partially quantify the uncertainty surrounding a parameter that is not characterized by random sampling, such as limiting reported significant figures. We present simple methods for doing such quantifications, and for incorporating them into calculations. More complicated methods become necessary when multiple sources of uncertainty must be combined. We demonstrate that Monte Carlo simulation, using available software, can estimate the uncertainty resulting from complicated calculations with many sources of uncertainty. We apply the method to the current estimate of the annual incidence of foodborne illness in the United States. Quantifying uncertainty from systematic errors is practical. Reporting this uncertainty would more honestly represent study results, help show the probability that estimated values fall within some critical range, and facilitate better targeting of further research.
The Infinitesimal Jackknife with Exploratory Factor Analysis

ERIC Educational Resources Information Center

Zhang, Guangjian; Preacher, Kristopher J.; Jennrich, Robert I.

2012-01-01

The infinitesimal jackknife, a nonparametric method for estimating standard errors, has been used to obtain standard error estimates in covariance structure analysis. In this article, we adapt it for obtaining standard errors for rotated factor loadings and factor correlations in exploratory factor analysis with sample correlation matrices. Both…
Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation

NASA Technical Reports Server (NTRS)

Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume

2013-01-01

Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.
Evaluating the accuracy of sampling to estimate central line-days: simplification of the National Healthcare Safety Network surveillance methods.

PubMed

Thompson, Nicola D; Edwards, Jonathan R; Bamberg, Wendy; Beldavs, Zintars G; Dumyati, Ghinwa; Godine, Deborah; Maloney, Meghan; Kainer, Marion; Ray, Susan; Thompson, Deborah; Wilson, Lucy; Magill, Shelley S

2013-03-01

To evaluate the accuracy of weekly sampling of central line-associated bloodstream infection (CLABSI) denominator data to estimate central line-days (CLDs). Obtained CLABSI denominator logs showing daily counts of patient-days and CLD for 6-12 consecutive months from participants and CLABSI numerators and facility and location characteristics from the National Healthcare Safety Network (NHSN). Convenience sample of 119 inpatient locations in 63 acute care facilities within 9 states participating in the Emerging Infections Program. Actual CLD and estimated CLD obtained from sampling denominator data on all single-day and 2-day (day-pair) samples were compared by assessing the distributions of the CLD percentage error. Facility and location characteristics associated with increased precision of estimated CLD were assessed. The impact of using estimated CLD to calculate CLABSI rates was evaluated by measuring the change in CLABSI decile ranking. The distribution of CLD percentage error varied by the day and number of days sampled. On average, day-pair samples provided more accurate estimates than did single-day samples. For several day-pair samples, approximately 90% of locations had CLD percentage error of less than or equal to ±5%. A lower number of CLD per month was most significantly associated with poor precision in estimated CLD. Most locations experienced no change in CLABSI decile ranking, and no location's CLABSI ranking changed by more than 2 deciles. Sampling to obtain estimated CLD is a valid alternative to daily data collection for a large proportion of locations. Development of a sampling guideline for NHSN users is underway.
Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST)

PubMed Central

Xu, Chonggang; Gertner, George

2013-01-01

Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. PMID:24143037
Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST).

PubMed

Xu, Chonggang; Gertner, George

2011-01-01

Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements.
Crop area estimation based on remotely-sensed data with an accurate but costly subsample

NASA Technical Reports Server (NTRS)

Gunst, R. F.

1983-01-01

Alternatives to sampling-theory stratified and regression estimators of crop production and timber biomass were examined. An alternative estimator which is viewed as especially promising is the errors-in-variable regression estimator. Investigations established the need for caution with this estimator when the ratio of two error variances is not precisely known.
Estimating the Uncertainty In Diameter Growth Model Predictions and Its Effects On The Uncertainty of Annual Inventory Estimates

Treesearch

Ronald E. McRoberts; Veronica C. Lessard

2001-01-01

Uncertainty in diameter growth predictions is attributed to three general sources: measurement error or sampling variability in predictor variables, parameter covariances, and residual or unexplained variation around model expectations. Using measurement error and sampling variability distributions obtained from the literature and Monte Carlo simulation methods, the...
Spatial and temporal variability of the overall error of National Atmospheric Deposition Program measurements determined by the USGS collocated-sampler program, water years 1989-2001

USGS Publications Warehouse

Wetherbee, G.A.; Latysh, N.E.; Gordon, J.D.

2005-01-01

Data from the U.S. Geological Survey (USGS) collocated-sampler program for the National Atmospheric Deposition Program/National Trends Network (NADP/NTN) are used to estimate the overall error of NADP/NTN measurements. Absolute errors are estimated by comparison of paired measurements from collocated instruments. Spatial and temporal differences in absolute error were identified and are consistent with longitudinal distributions of NADP/NTN measurements and spatial differences in precipitation characteristics. The magnitude of error for calcium, magnesium, ammonium, nitrate, and sulfate concentrations, specific conductance, and sample volume is of minor environmental significance to data users. Data collected after a 1994 sample-handling protocol change are prone to less absolute error than data collected prior to 1994. Absolute errors are smaller during non-winter months than during winter months for selected constituents at sites where frozen precipitation is common. Minimum resolvable differences are estimated for different regions of the USA to aid spatial and temporal watershed analyses.
Estimating Uncertainty in Annual Forest Inventory Estimates

Treesearch

Ronald E. McRoberts; Veronica C. Lessard

1999-01-01

The precision of annual forest inventory estimates may be negatively affected by uncertainty from a variety of sources including: (1) sampling error; (2) procedures for updating plots not measured in the current year; and (3) measurement errors. The impact of these sources of uncertainty on final inventory estimates is investigated using Monte Carlo simulation...
A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

PubMed Central

Jacob, Benjamin G; Griffith, Daniel A; Muturi, Ephantus J; Caamano, Erick X; Githure, John I; Novak, Robert J

2009-01-01

Background Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix. Results By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat. Conclusion An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity. PMID:19772590
High dimensional linear regression models under long memory dependence and measurement error

NASA Astrophysics Data System (ADS)

Kaul, Abhishek

This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.
GUM Analysis for SIMS Isotopic Ratios in BEP0 Graphite Qualification Samples, Round 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerlach, David C.; Heasler, Patrick G.; Reid, Bruce D.

2009-01-01

This report describes GUM calculations for TIMS and SIMS isotopic ratio measurements of reactor graphite samples. These isotopic ratios are used to estimate reactor burn-up, and currently consist of various ratios of U, Pu, and Boron impurities in the graphite samples. The GUM calculation is a propagation of error methodology that assigns uncertainties (in the form of standard error and confidence bound) to the final estimates.
Elimination of Emergency Department Medication Errors Due To Estimated Weights.

PubMed

Greenwalt, Mary; Griffen, David; Wilkerson, Jim

2017-01-01

From 7/2014 through 6/2015, 10 emergency department (ED) medication dosing errors were reported through the electronic incident reporting system of an urban academic medical center. Analysis of these medication errors identified inaccurate estimated weight on patients as the root cause. The goal of this project was to reduce weight-based dosing medication errors due to inaccurate estimated weights on patients presenting to the ED. Chart review revealed that 13.8% of estimated weights documented on admitted ED patients varied more than 10% from subsequent actual admission weights recorded. A random sample of 100 charts containing estimated weights revealed 2 previously unreported significant medication dosage errors (.02 significant error rate). Key improvements included removing barriers to weighing ED patients, storytelling to engage staff and change culture, and removal of the estimated weight documentation field from the ED electronic health record (EHR) forms. With these improvements estimated weights on ED patients, and the resulting medication errors, were eliminated.
Simulation techniques for estimating error in the classification of normal patterns

NASA Technical Reports Server (NTRS)

Whitsitt, S. J.; Landgrebe, D. A.

1974-01-01

Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.
Sampling design optimization for spatial functions

USGS Publications Warehouse

Olea, R.A.

1984-01-01

A new procedure is presented for minimizing the sampling requirements necessary to estimate a mappable spatial function at a specified level of accuracy. The technique is based on universal kriging, an estimation method within the theory of regionalized variables. Neither actual implementation of the sampling nor universal kriging estimations are necessary to make an optimal design. The average standard error and maximum standard error of estimation over the sampling domain are used as global indices of sampling efficiency. The procedure optimally selects those parameters controlling the magnitude of the indices, including the density and spatial pattern of the sample elements and the number of nearest sample elements used in the estimation. As an illustration, the network of observation wells used to monitor the water table in the Equus Beds of Kansas is analyzed and an improved sampling pattern suggested. This example demonstrates the practical utility of the procedure, which can be applied equally well to other spatial sampling problems, as the procedure is not limited by the nature of the spatial function. ?? 1984 Plenum Publishing Corporation.
Estimation of sampling error uncertainties in observed surface air temperature change in China

NASA Astrophysics Data System (ADS)

Hua, Wei; Shen, Samuel S. P.; Weithmann, Alexander; Wang, Huijun

2017-08-01

This study examines the sampling error uncertainties in the monthly surface air temperature (SAT) change in China over recent decades, focusing on the uncertainties of gridded data, national averages, and linear trends. Results indicate that large sampling error variances appear at the station-sparse area of northern and western China with the maximum value exceeding 2.0 K2 while small sampling error variances are found at the station-dense area of southern and eastern China with most grid values being less than 0.05 K2. In general, the negative temperature existed in each month prior to the 1980s, and a warming in temperature began thereafter, which accelerated in the early and mid-1990s. The increasing trend in the SAT series was observed for each month of the year with the largest temperature increase and highest uncertainty of 0.51 ± 0.29 K (10 year)-1 occurring in February and the weakest trend and smallest uncertainty of 0.13 ± 0.07 K (10 year)-1 in August. The sampling error uncertainties in the national average annual mean SAT series are not sufficiently large to alter the conclusion of the persistent warming in China. In addition, the sampling error uncertainties in the SAT series show a clear variation compared with other uncertainty estimation methods, which is a plausible reason for the inconsistent variations between our estimate and other studies during this period.

Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample

PubMed Central

Wang, Ching-Yun; Song, Xiao

2017-01-01

SUMMARY Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women’s Health Initiative. PMID:27546625
Synthetic Aperture Sonar Processing with MMSE Estimation of Image Sample Values

DTIC Science & Technology

2016-12-01

UNCLASSIFIED/UNLIMITED 13. SUPPLEMENTARY NOTES 14. ABSTRACT MMSE (minimum mean- square error) target sample estimation using non-orthogonal basis...orthogonal, they can still be used in a minimum mean‐ square error (MMSE) estimator that models the object echo as a weighted sum of the multi‐aspect basis...problem. 3 Introduction Minimum mean‐ square error (MMSE) estimation is applied to target imaging with synthetic aperture
Improved Margin of Error Estimates for Proportions in Business: An Educational Example

ERIC Educational Resources Information Center

Arzumanyan, George; Halcoussis, Dennis; Phillips, G. Michael

2015-01-01

This paper presents the Agresti & Coull "Adjusted Wald" method for computing confidence intervals and margins of error for common proportion estimates. The presented method is easily implementable by business students and practitioners and provides more accurate estimates of proportions particularly in extreme samples and small…
LACIE performance predictor FOC users manual

NASA Technical Reports Server (NTRS)

1976-01-01

The LACIE Performance Predictor (LPP) is a computer simulation of the LACIE process for predicting worldwide wheat production. The simulation provides for the introduction of various errors into the system and provides estimates based on these errors, thus allowing the user to determine the impact of selected error sources. The FOC LPP simulates the acquisition of the sample segment data by the LANDSAT Satellite (DAPTS), the classification of the agricultural area within the sample segment (CAMS), the estimation of the wheat yield (YES), and the production estimation and aggregation (CAS). These elements include data acquisition characteristics, environmental conditions, classification algorithms, the LACIE aggregation and data adjustment procedures. The operational structure for simulating these elements consists of the following key programs: (1) LACIE Utility Maintenance Process, (2) System Error Executive, (3) Ephemeris Generator, (4) Access Generator, (5) Acquisition Selector, (6) LACIE Error Model (LEM), and (7) Post Processor.
Per-pixel bias-variance decomposition of continuous errors in data-driven geospatial modeling: A case study in environmental remote sensing

NASA Astrophysics Data System (ADS)

Gao, Jing; Burt, James E.

2017-12-01

This study investigates the usefulness of a per-pixel bias-variance error decomposition (BVD) for understanding and improving spatially-explicit data-driven models of continuous variables in environmental remote sensing (ERS). BVD is a model evaluation method originated from machine learning and have not been examined for ERS applications. Demonstrated with a showcase regression tree model mapping land imperviousness (0-100%) using Landsat images, our results showed that BVD can reveal sources of estimation errors, map how these sources vary across space, reveal the effects of various model characteristics on estimation accuracy, and enable in-depth comparison of different error metrics. Specifically, BVD bias maps can help analysts identify and delineate model spatial non-stationarity; BVD variance maps can indicate potential effects of ensemble methods (e.g. bagging), and inform efficient training sample allocation - training samples should capture the full complexity of the modeled process, and more samples should be allocated to regions with more complex underlying processes rather than regions covering larger areas. Through examining the relationships between model characteristics and their effects on estimation accuracy revealed by BVD for both absolute and squared errors (i.e. error is the absolute or the squared value of the difference between observation and estimate), we found that the two error metrics embody different diagnostic emphases, can lead to different conclusions about the same model, and may suggest different solutions for performance improvement. We emphasize BVD's strength in revealing the connection between model characteristics and estimation accuracy, as understanding this relationship empowers analysts to effectively steer performance through model adjustments.
SU-E-T-769: T-Test Based Prior Error Estimate and Stopping Criterion for Monte Carlo Dose Calculation in Proton Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, X; Gao, H; Schuemann, J

2015-06-15

Purpose: The Monte Carlo (MC) method is a gold standard for dose calculation in radiotherapy. However, it is not a priori clear how many particles need to be simulated to achieve a given dose accuracy. Prior error estimate and stopping criterion are not well established for MC. This work aims to fill this gap. Methods: Due to the statistical nature of MC, our approach is based on one-sample t-test. We design the prior error estimate method based on the t-test, and then use this t-test based error estimate for developing a simulation stopping criterion. The three major components are asmore » follows.First, the source particles are randomized in energy, space and angle, so that the dose deposition from a particle to the voxel is independent and identically distributed (i.i.d.).Second, a sample under consideration in the t-test is the mean value of dose deposition to the voxel by sufficiently large number of source particles. Then according to central limit theorem, the sample as the mean value of i.i.d. variables is normally distributed with the expectation equal to the true deposited dose.Third, the t-test is performed with the null hypothesis that the difference between sample expectation (the same as true deposited dose) and on-the-fly calculated mean sample dose from MC is larger than a given error threshold, in addition to which users have the freedom to specify confidence probability and region of interest in the t-test based stopping criterion. Results: The method is validated for proton dose calculation. The difference between the MC Result based on the t-test prior error estimate and the statistical Result by repeating numerous MC simulations is within 1%. Conclusion: The t-test based prior error estimate and stopping criterion are developed for MC and validated for proton dose calculation. Xiang Hong and Hao Gao were partially supported by the NSFC (#11405105), the 973 Program (#2015CB856000) and the Shanghai Pujiang Talent Program (#14PJ1404500)« less
Jackknife Estimation of Sampling Variance of Ratio Estimators in Complex Samples: Bias and the Coefficient of Variation. Research Report. ETS RR-06-19

ERIC Educational Resources Information Center

Oranje, Andreas

2006-01-01

A multitude of methods has been proposed to estimate the sampling variance of ratio estimates in complex samples (Wolter, 1985). Hansen and Tepping (1985) studied some of those variance estimators and found that a high coefficient of variation (CV) of the denominator of a ratio estimate is indicative of a biased estimate of the standard error of a…
Estimates and Standard Errors for Ratios of Normalizing Constants from Multiple Markov Chains via Regeneration

PubMed Central

Doss, Hani; Tan, Aixin

2017-01-01

In the classical biased sampling problem, we have k densities π1(·), …, πk(·), each known up to a normalizing constant, i.e. for l = 1, …, k, πl(·) = νl(·)/ml, where νl(·) is a known function and ml is an unknown constant. For each l, we have an iid sample from πl,·and the problem is to estimate the ratios ml/ms for all l and all s. This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his co-workers over two decades ago, and there has been much subsequent work on this problem from many different perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the πl’s are not necessarily iid sequences, but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the iid case and the Markov chain case. PMID:28706463
Estimates and Standard Errors for Ratios of Normalizing Constants from Multiple Markov Chains via Regeneration.

PubMed

Doss, Hani; Tan, Aixin

2014-09-01

In the classical biased sampling problem, we have k densities π 1 (·), …, π k (·), each known up to a normalizing constant, i.e. for l = 1, …, k , π l (·) = ν l (·)/ m l , where ν l (·) is a known function and m l is an unknown constant. For each l , we have an iid sample from π l , · and the problem is to estimate the ratios m l /m s for all l and all s . This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his co-workers over two decades ago, and there has been much subsequent work on this problem from many different perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the π l 's are not necessarily iid sequences, but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the iid case and the Markov chain case.
Computation of Standard Errors

PubMed Central

Dowd, Bryan E; Greene, William H; Norton, Edward C

2014-01-01

Objectives We discuss the problem of computing the standard errors of functions involving estimated parameters and provide the relevant computer code for three different computational approaches using two popular computer packages. Study Design We show how to compute the standard errors of several functions of interest: the predicted value of the dependent variable for a particular subject, and the effect of a change in an explanatory variable on the predicted value of the dependent variable for an individual subject and average effect for a sample of subjects. Empirical Application Using a publicly available dataset, we explain three different methods of computing standard errors: the delta method, Krinsky–Robb, and bootstrapping. We provide computer code for Stata 12 and LIMDEP 10/NLOGIT 5. Conclusions In most applications, choice of the computational method for standard errors of functions of estimated parameters is a matter of convenience. However, when computing standard errors of the sample average of functions that involve both estimated parameters and nonstochastic explanatory variables, it is important to consider the sources of variation in the function's values. PMID:24800304
USGS Blind Sample Project: monitoring and evaluating laboratory analytical quality

USGS Publications Warehouse

Ludtke, Amy S.; Woodworth, Mark T.

1997-01-01

The U.S. Geological Survey (USGS) collects and disseminates information about the Nation's water resources. Surface- and ground-water samples are collected and sent to USGS laboratories for chemical analyses. The laboratories identify and quantify the constituents in the water samples. Random and systematic errors occur during sample handling, chemical analysis, and data processing. Although all errors cannot be eliminated from measurements, the magnitude of their uncertainty can be estimated and tracked over time. Since 1981, the USGS has operated an independent, external, quality-assurance project called the Blind Sample Project (BSP). The purpose of the BSP is to monitor and evaluate the quality of laboratory analytical results through the use of double-blind quality-control (QC) samples. The information provided by the BSP assists the laboratories in detecting and correcting problems in the analytical procedures. The information also can aid laboratory users in estimating the extent that laboratory errors contribute to the overall errors in their environmental data.
Monitoring forest areas from continental to territorial levels using a sample of medium spatial resolution satellite imagery

NASA Astrophysics Data System (ADS)

Eva, Hugh; Carboni, Silvia; Achard, Frédéric; Stach, Nicolas; Durieux, Laurent; Faure, Jean-François; Mollicone, Danilo

A global systematic sampling scheme has been developed by the UN FAO and the EC TREES project to estimate rates of deforestation at global or continental levels at intervals of 5 to 10 years. This global scheme can be intensified to produce results at the national level. In this paper, using surrogate observations, we compare the deforestation estimates derived from these two levels of sampling intensities (one, the global, for the Brazilian Amazon the other, national, for French Guiana) to estimates derived from the official inventories. We also report the precisions that are achieved due to sampling errors and, in the case of French Guiana, compare such precision with the official inventory precision. We extract nine sample data sets from the official wall-to-wall deforestation map derived from satellite interpretations produced for the Brazilian Amazon for the year 2002 to 2003. This global sampling scheme estimate gives 2.81 million ha of deforestation (mean from nine simulated replicates) with a standard error of 0.10 million ha. This compares with the full population estimate from the wall-to-wall interpretations of 2.73 million ha deforested, which is within one standard error of our sampling test estimate. The relative difference between the mean estimate from sampling approach and the full population estimate is 3.1%, and the standard error represents 4.0% of the full population estimate. This global sampling is then intensified to a territorial level with a case study over French Guiana to estimate deforestation between the years 1990 and 2006. For the historical reference period, 1990, Landsat-5 Thematic Mapper data were used. A coverage of SPOT-HRV imagery at 20 m × 20 m resolution acquired at the Cayenne receiving station in French Guiana was used for year 2006. Our estimates from the intensified global sampling scheme over French Guiana are compared with those produced by the national authority to report on deforestation rates under the Kyoto protocol rules for its overseas department. The latter estimates come from a sample of nearly 17,000 plots analyzed from same spatial imagery acquired between year 1990 and year 2006. This sampling scheme is derived from the traditional forest inventory methods carried out by IFN (Inventaire Forestier National). Our intensified global sampling scheme leads to an estimate of 96,650 ha deforested between 1990 and 2006, which is within the 95% confidence interval of the IFN sampling scheme, which gives an estimate of 91,722 ha, representing a relative difference from the IFN of 5.4%. These results demonstrate that the intensification of the global sampling scheme can provide forest area change estimates close to those achieved by official forest inventories (<6%), with precisions of between 4% and 7%, although we only estimate errors from sampling, not from the use of surrogate data. Such methods could be used by developing countries to demonstrate that they are fulfilling requirements for reducing emissions from deforestation in the framework of an REDD (Reducing Emissions from Deforestation in Developing Countries) mechanism under discussion within the United Nations Framework Convention on Climate Change (UNFCCC). Monitoring systems at national levels in tropical countries can also benefit from pan-tropical and regional observations, to ensure consistency between different national monitoring systems.
Accounting for Sampling Error in Genetic Eigenvalues Using Random Matrix Theory.

PubMed

Sztepanacz, Jacqueline L; Blows, Mark W

2017-07-01

The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. Copyright © 2017 by the Genetics Society of America.
Enhancing adaptive sparse grid approximations and improving refinement strategies using adjoint-based a posteriori error estimates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jakeman, J.D., E-mail: jdjakem@sandia.gov; Wildey, T.

2015-01-01

In this paper we present an algorithm for adaptive sparse grid approximations of quantities of interest computed from discretized partial differential equations. We use adjoint-based a posteriori error estimates of the physical discretization error and the interpolation error in the sparse grid to enhance the sparse grid approximation and to drive adaptivity of the sparse grid. Utilizing these error estimates provides significantly more accurate functional values for random samples of the sparse grid approximation. We also demonstrate that alternative refinement strategies based upon a posteriori error estimates can lead to further increases in accuracy in the approximation over traditional hierarchicalmore » surplus based strategies. Throughout this paper we also provide and test a framework for balancing the physical discretization error with the stochastic interpolation error of the enhanced sparse grid approximation.« less
Enhancing adaptive sparse grid approximations and improving refinement strategies using adjoint-based a posteriori error estimates

DOE PAGES

Jakeman, J. D.; Wildey, T.

2015-01-01

In this paper we present an algorithm for adaptive sparse grid approximations of quantities of interest computed from discretized partial differential equations. We use adjoint-based a posteriori error estimates of the interpolation error in the sparse grid to enhance the sparse grid approximation and to drive adaptivity. We show that utilizing these error estimates provides significantly more accurate functional values for random samples of the sparse grid approximation. We also demonstrate that alternative refinement strategies based upon a posteriori error estimates can lead to further increases in accuracy in the approximation over traditional hierarchical surplus based strategies. Throughout this papermore » we also provide and test a framework for balancing the physical discretization error with the stochastic interpolation error of the enhanced sparse grid approximation.« less
Sampling for mercury at subnanogram per litre concentrations for load estimation in rivers

USGS Publications Warehouse

Colman, J.A.; Breault, R.F.

2000-01-01

Estimation of constituent loads in streams requires collection of stream samples that are representative of constituent concentrations, that is, composites of isokinetic multiple verticals collected along a stream transect. An all-Teflon isokinetic sampler (DH-81) cleaned in 75??C, 4 N HCl was tested using blank, split, and replicate samples to assess systematic and random sample contamination by mercury species. Mean mercury concentrations in field-equipment blanks were low: 0.135 ng??L-1 for total mercury (??Hg) and 0.0086 ng??L-1 for monomethyl mercury (MeHg). Mean square errors (MSE) for ??Hg and MeHg duplicate samples collected at eight sampling stations were not statistically different from MSE of samples split in the laboratory, which represent the analytical and splitting error. Low fieldblank concentrations and statistically equal duplicate- and split-sample MSE values indicate that no measurable contamination was occurring during sampling. Standard deviations associated with example mercury load estimations were four to five times larger, on a relative basis, than standard deviations calculated from duplicate samples, indicating that error of the load determination was primarily a function of the loading model used, not of sampling or analytical methods.
Modified fast frequency acquisition via adaptive least squares algorithm

NASA Technical Reports Server (NTRS)

Kumar, Rajendra (Inventor)

1992-01-01

A method and the associated apparatus for estimating the amplitude, frequency, and phase of a signal of interest are presented. The method comprises the following steps: (1) inputting the signal of interest; (2) generating a reference signal with adjustable amplitude, frequency and phase at an output thereof; (3) mixing the signal of interest with the reference signal and a signal 90 deg out of phase with the reference signal to provide a pair of quadrature sample signals comprising respectively a difference between the signal of interest and the reference signal and a difference between the signal of interest and the signal 90 deg out of phase with the reference signal; (4) using the pair of quadrature sample signals to compute estimates of the amplitude, frequency, and phase of an error signal comprising the difference between the signal of interest and the reference signal employing a least squares estimation; (5) adjusting the amplitude, frequency, and phase of the reference signal from the numerically controlled oscillator in a manner which drives the error signal towards zero; and (6) outputting the estimates of the amplitude, frequency, and phase of the error signal in combination with the reference signal to produce a best estimate of the amplitude, frequency, and phase of the signal of interest. The preferred method includes the step of providing the error signal as a real time confidence measure as to the accuracy of the estimates wherein the closer the error signal is to zero, the higher the probability that the estimates are accurate. A matrix in the estimation algorithm provides an estimate of the variance of the estimation error.
(How) do we learn from errors? A prospective study of the link between the ward's learning practices and medication administration errors.

PubMed

Drach-Zahavy, A; Somech, A; Admi, H; Peterfreund, I; Peker, H; Priente, O

2014-03-01

Attention in the ward should shift from preventing medication administration errors to managing them. Nevertheless, little is known in regard with the practices nursing wards apply to learn from medication administration errors as a means of limiting them. To test the effectiveness of four types of learning practices, namely, non-integrated, integrated, supervisory and patchy learning practices in limiting medication administration errors. Data were collected from a convenient sample of 4 hospitals in Israel by multiple methods (observations and self-report questionnaires) at two time points. The sample included 76 wards (360 nurses). Medication administration error was defined as any deviation from prescribed medication processes and measured by a validated structured observation sheet. Wards' use of medication administration technologies, location of the medication station, and workload were observed; learning practices and demographics were measured by validated questionnaires. Results of the mixed linear model analysis indicated that the use of technology and quiet location of the medication cabinet were significantly associated with reduced medication administration errors (estimate=.03, p<.05 and estimate=-.17, p<.01 correspondingly), while workload was significantly linked to inflated medication administration errors (estimate=.04, p<.05). Of the learning practices, supervisory learning was the only practice significantly linked to reduced medication administration errors (estimate=-.04, p<.05). Integrated and patchy learning were significantly linked to higher levels of medication administration errors (estimate=-.03, p<.05 and estimate=-.04, p<.01 correspondingly). Non-integrated learning was not associated with it (p>.05). How wards manage errors might have implications for medication administration errors beyond the effects of typical individual, organizational and technology risk factors. Head nurse can facilitate learning from errors by "management by walking around" and monitoring nurses' medication administration behaviors. Copyright © 2013 Elsevier Ltd. All rights reserved.
Sources of error in estimating truck traffic from automatic vehicle classification data

DOT National Transportation Integrated Search

1998-10-01

Truck annual average daily traffic estimation errors resulting from sample classification counts are computed in this paper under two scenarios. One scenario investigates an improper factoring procedure that may be used by highway agencies. The study...
Field evaluation of distance-estimation error during wetland-dependent bird surveys

USGS Publications Warehouse

Nadeau, Christopher P.; Conway, Courtney J.

2012-01-01

Context: The most common methods to estimate detection probability during avian point-count surveys involve recording a distance between the survey point and individual birds detected during the survey period. Accurately measuring or estimating distance is an important assumption of these methods; however, this assumption is rarely tested in the context of aural avian point-count surveys. Aims: We expand on recent bird-simulation studies to document the error associated with estimating distance to calling birds in a wetland ecosystem. Methods: We used two approaches to estimate the error associated with five surveyor's distance estimates between the survey point and calling birds, and to determine the factors that affect a surveyor's ability to estimate distance. Key results: We observed biased and imprecise distance estimates when estimating distance to simulated birds in a point-count scenario (x̄error = -9 m, s.d.error = 47 m) and when estimating distances to real birds during field trials (x̄error = 39 m, s.d.error = 79 m). The amount of bias and precision in distance estimates differed among surveyors; surveyors with more training and experience were less biased and more precise when estimating distance to both real and simulated birds. Three environmental factors were important in explaining the error associated with distance estimates, including the measured distance from the bird to the surveyor, the volume of the call and the species of bird. Surveyors tended to make large overestimations to birds close to the survey point, which is an especially serious error in distance sampling. Conclusions: Our results suggest that distance-estimation error is prevalent, but surveyor training may be the easiest way to reduce distance-estimation error. Implications: The present study has demonstrated how relatively simple field trials can be used to estimate the error associated with distance estimates used to estimate detection probability during avian point-count surveys. Evaluating distance-estimation errors will allow investigators to better evaluate the accuracy of avian density and trend estimates. Moreover, investigators who evaluate distance-estimation errors could employ recently developed models to incorporate distance-estimation error into analyses. We encourage further development of such models, including the inclusion of such models into distance-analysis software.

Uncertainty in predicting soil hydraulic properties at the hillslope scale with indirect methods

NASA Astrophysics Data System (ADS)

Chirico, G. B.; Medina, H.; Romano, N.

2007-02-01

SummarySeveral hydrological applications require the characterisation of the soil hydraulic properties at large spatial scales. Pedotransfer functions (PTFs) are being developed as simplified methods to estimate soil hydraulic properties as an alternative to direct measurements, which are unfeasible for most practical circumstances. The objective of this study is to quantify the uncertainty in PTFs spatial predictions at the hillslope scale as related to the sampling density, due to: (i) the error in estimated soil physico-chemical properties and (ii) PTF model error. The analysis is carried out on a 2-km-long experimental hillslope in South Italy. The method adopted is based on a stochastic generation of patterns of soil variables using sequential Gaussian simulation, conditioned to the observed sample data. The following PTFs are applied: Vereecken's PTF [Vereecken, H., Diels, J., van Orshoven, J., Feyen, J., Bouma, J., 1992. Functional evaluation of pedotransfer functions for the estimation of soil hydraulic properties. Soil Sci. Soc. Am. J. 56, 1371-1378] and HYPRES PTF [Wösten, J.H.M., Lilly, A., Nemes, A., Le Bas, C., 1999. Development and use of a database of hydraulic properties of European soils. Geoderma 90, 169-185]. The two PTFs estimate reliably the soil water retention characteristic even for a relatively coarse sampling resolution, with prediction uncertainties comparable to the uncertainties in direct laboratory or field measurements. The uncertainty of soil water retention prediction due to the model error is as much as or more significant than the uncertainty associated with the estimated input, even for a relatively coarse sampling resolution. Prediction uncertainties are much more important when PTF are applied to estimate the saturated hydraulic conductivity. In this case model error dominates the overall prediction uncertainties, making negligible the effect of the input error.
Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample.

PubMed

Wang, Ching-Yun; Song, Xiao

2016-11-01

Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Detecting genotyping errors and describing black bear movement in northern Idaho

Treesearch

Michael K. Schwartz; Samuel A. Cushman; Kevin S. McKelvey; Jim Hayden; Cory Engkjer

2006-01-01

Non-invasive genetic sampling has become a favored tool to enumerate wildlife. Genetic errors, caused by poor quality samples, can lead to substantial biases in numerical estimates of individuals. We demonstrate how the computer program DROPOUT can detect amplification errors (false alleles and allelic dropout) in a black bear (Ursus americanus) dataset collected in...
GUM Analysis for TIMS and SIMS Isotopic Ratios in Graphite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heasler, Patrick G.; Gerlach, David C.; Cliff, John B.

2007-04-01

This report describes GUM calculations for TIMS and SIMS isotopic ratio measurements of reactor graphite samples. These isotopic ratios are used to estimate reactor burn-up, and currently consist of various ratios of U, Pu, and Boron impurities in the graphite samples. The GUM calculation is a propagation of error methodology that assigns uncertainties (in the form of standard error and confidence bound) to the final estimates.
How Much Confidence Can We Have in EU-SILC? Complex Sample Designs and the Standard Error of the Europe 2020 Poverty Indicators

ERIC Educational Resources Information Center

Goedeme, Tim

2013-01-01

If estimates are based on samples, they should be accompanied by appropriate standard errors and confidence intervals. This is true for scientific research in general, and is even more important if estimates are used to inform and evaluate policy measures such as those aimed at attaining the Europe 2020 poverty reduction target. In this article I…
Sampling Error in Relation to Cyst Nematode Population Density Estimation in Small Field Plots.

PubMed

Župunski, Vesna; Jevtić, Radivoje; Jokić, Vesna Spasić; Župunski, Ljubica; Lalošević, Mirjana; Ćirić, Mihajlo; Ćurčić, Živko

2017-06-01

Cyst nematodes are serious plant-parasitic pests which could cause severe yield losses and extensive damage. Since there is still very little information about error of population density estimation in small field plots, this study contributes to the broad issue of population density assessment. It was shown that there was no significant difference between cyst counts of five or seven bulk samples taken per each 1-m 2 plot, if average cyst count per examined plot exceeds 75 cysts per 100 g of soil. Goodness of fit of data to probability distribution tested with χ 2 test confirmed a negative binomial distribution of cyst counts for 21 out of 23 plots. The recommended measure of sampling precision of 17% expressed through coefficient of variation ( cv ) was achieved if the plots of 1 m 2 contaminated with more than 90 cysts per 100 g of soil were sampled with 10-core bulk samples taken in five repetitions. If plots were contaminated with less than 75 cysts per 100 g of soil, 10-core bulk samples taken in seven repetitions gave cv higher than 23%. This study indicates that more attention should be paid on estimation of sampling error in experimental field plots to ensure more reliable estimation of population density of cyst nematodes.
A New Stratified Sampling Procedure which Decreases Error Estimation of Varroa Mite Number on Sticky Boards.

PubMed

Kretzschmar, A; Durand, E; Maisonnasse, A; Vallon, J; Le Conte, Y

2015-06-01

A new procedure of stratified sampling is proposed in order to establish an accurate estimation of Varroa destructor populations on sticky bottom boards of the hive. It is based on the spatial sampling theory that recommends using regular grid stratification in the case of spatially structured process. The distribution of varroa mites on sticky board being observed as spatially structured, we designed a sampling scheme based on a regular grid with circles centered on each grid element. This new procedure is then compared with a former method using partially random sampling. Relative error improvements are exposed on the basis of a large sample of simulated sticky boards (n=20,000) which provides a complete range of spatial structures, from a random structure to a highly frame driven structure. The improvement of varroa mite number estimation is then measured by the percentage of counts with an error greater than a given level. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Improving the accuracy of hyaluronic acid molecular weight estimation by conventional size exclusion chromatography.

PubMed

Shanmuga Doss, Sreeja; Bhatt, Nirav Pravinbhai; Jayaraman, Guhan

2017-08-15

There is an unreasonably high variation in the literature reports on molecular weight of hyaluronic acid (HA) estimated using conventional size exclusion chromatography (SEC). This variation is most likely due to errors in estimation. Working with commercially available HA molecular weight standards, this work examines the extent of error in molecular weight estimation due to two factors: use of non-HA based calibration and concentration of sample injected into the SEC column. We develop a multivariate regression correlation to correct for concentration effect. Our analysis showed that, SEC calibration based on non-HA standards like polyethylene oxide and pullulan led to approximately 2 and 10 times overestimation, respectively, when compared to HA-based calibration. Further, we found that injected sample concentration has an effect on molecular weight estimation. Even at 1g/l injected sample concentration, HA molecular weight standards of 0.7 and 1.64MDa showed appreciable underestimation of 11-24%. The multivariate correlation developed was found to reduce error in estimations at 1g/l to <4%. The correlation was also successfully applied to accurately estimate the molecular weight of HA produced by a recombinant Lactococcus lactis fermentation. Copyright © 2017 Elsevier B.V. All rights reserved.
A technique for evaluating the influence of spatial sampling on the determination of global mean total columnar ozone

NASA Technical Reports Server (NTRS)

Tolson, R. H.

1981-01-01

A technique is described for providing a means of evaluating the influence of spatial sampling on the determination of global mean total columnar ozone. A finite number of coefficients in the expansion are determined, and the truncated part of the expansion is shown to contribute an error to the estimate, which depends strongly on the spatial sampling and is relatively insensitive to data noise. First and second order statistics are derived for each term in a spherical harmonic expansion which represents the ozone field, and the statistics are used to estimate systematic and random errors in the estimates of total ozone.
State of charge monitoring of vanadium redox flow batteries using half cell potentials and electrolyte density

NASA Astrophysics Data System (ADS)

Ressel, Simon; Bill, Florian; Holtz, Lucas; Janshen, Niklas; Chica, Antonio; Flower, Thomas; Weidlich, Claudia; Struckmann, Thorsten

2018-02-01

The operation of vanadium redox flow batteries requires reliable in situ state of charge (SOC) monitoring. In this study, two SOC estimation approaches for the negative half cell are investigated. First, in situ open circuit potential measurements are combined with Coulomb counting in a one-step calibration of SOC and Nernst potential which doesn't need additional reference SOCs. In-sample and out-of-sample SOCs are estimated and analyzed, estimation errors ≤ 0.04 are obtained. In the second approach, temperature corrected in situ electrolyte density measurements are used for the first time in vanadium redox flow batteries for SOC estimation. In-sample and out-of-sample SOC estimation errors ≤ 0.04 demonstrate the feasibility of this approach. Both methods allow recalibration during battery operation. The actual capacity obtained from SOC calibration can be used in a state of health model.
An Empirical State Error Covariance Matrix for Batch State Estimation

NASA Technical Reports Server (NTRS)

Frisbee, Joseph H., Jr.

2011-01-01

State estimation techniques serve effectively to provide mean state estimates. However, the state error covariance matrices provided as part of these techniques suffer from some degree of lack of confidence in their ability to adequately describe the uncertainty in the estimated states. A specific problem with the traditional form of state error covariance matrices is that they represent only a mapping of the assumed observation error characteristics into the state space. Any errors that arise from other sources (environment modeling, precision, etc.) are not directly represented in a traditional, theoretical state error covariance matrix. Consider that an actual observation contains only measurement error and that an estimated observation contains all other errors, known and unknown. It then follows that a measurement residual (the difference between expected and observed measurements) contains all errors for that measurement. Therefore, a direct and appropriate inclusion of the actual measurement residuals in the state error covariance matrix will result in an empirical state error covariance matrix. This empirical state error covariance matrix will fully account for the error in the state estimate. By way of a literal reinterpretation of the equations involved in the weighted least squares estimation algorithm, it is possible to arrive at an appropriate, and formally correct, empirical state error covariance matrix. The first specific step of the method is to use the average form of the weighted measurement residual variance performance index rather than its usual total weighted residual form. Next it is helpful to interpret the solution to the normal equations as the average of a collection of sample vectors drawn from a hypothetical parent population. From here, using a standard statistical analysis approach, it directly follows as to how to determine the standard empirical state error covariance matrix. This matrix will contain the total uncertainty in the state estimate, regardless as to the source of the uncertainty. Also, in its most straight forward form, the technique only requires supplemental calculations to be added to existing batch algorithms. The generation of this direct, empirical form of the state error covariance matrix is independent of the dimensionality of the observations. Mixed degrees of freedom for an observation set are allowed. As is the case with any simple, empirical sample variance problems, the presented approach offers an opportunity (at least in the case of weighted least squares) to investigate confidence interval estimates for the error covariance matrix elements. The diagonal or variance terms of the error covariance matrix have a particularly simple form to associate with either a multiple degree of freedom chi-square distribution (more approximate) or with a gamma distribution (less approximate). The off diagonal or covariance terms of the matrix are less clear in their statistical behavior. However, the off diagonal covariance matrix elements still lend themselves to standard confidence interval error analysis. The distributional forms associated with the off diagonal terms are more varied and, perhaps, more approximate than those associated with the diagonal terms. Using a simple weighted least squares sample problem, results obtained through use of the proposed technique are presented. The example consists of a simple, two observer, triangulation problem with range only measurements. Variations of this problem reflect an ideal case (perfect knowledge of the range errors) and a mismodeled case (incorrect knowledge of the range errors).
Optimal estimation of suspended-sediment concentrations in streams

USGS Publications Warehouse

Holtschlag, D.J.

2001-01-01

Optimal estimators are developed for computation of suspended-sediment concentrations in streams. The estimators are a function of parameters, computed by use of generalized least squares, which simultaneously account for effects of streamflow, seasonal variations in average sediment concentrations, a dynamic error component, and the uncertainty in concentration measurements. The parameters are used in a Kalman filter for on-line estimation and an associated smoother for off-line estimation of suspended-sediment concentrations. The accuracies of the optimal estimators are compared with alternative time-averaging interpolators and flow-weighting regression estimators by use of long-term daily-mean suspended-sediment concentration and streamflow data from 10 sites within the United States. For sampling intervals from 3 to 48 days, the standard errors of on-line and off-line optimal estimators ranged from 52.7 to 107%, and from 39.5 to 93.0%, respectively. The corresponding standard errors of linear and cubic-spline interpolators ranged from 48.8 to 158%, and from 50.6 to 176%, respectively. The standard errors of simple and multiple regression estimators, which did not vary with the sampling interval, were 124 and 105%, respectively. Thus, the optimal off-line estimator (Kalman smoother) had the lowest error characteristics of those evaluated. Because suspended-sediment concentrations are typically measured at less than 3-day intervals, use of optimal estimators will likely result in significant improvements in the accuracy of continuous suspended-sediment concentration records. Additional research on the integration of direct suspended-sediment concentration measurements and optimal estimators applied at hourly or shorter intervals is needed.
Combining inferences from models of capture efficiency, detectability, and suitable habitat to classify landscapes for conservation of threatened bull trout

USGS Publications Warehouse

Peterson, J.; Dunham, J.B.

2003-01-01

Effective conservation efforts for at-risk species require knowledge of the locations of existing populations. Species presence can be estimated directly by conducting field-sampling surveys or alternatively by developing predictive models. Direct surveys can be expensive and inefficient, particularly for rare and difficult-to-sample species, and models of species presence may produce biased predictions. We present a Bayesian approach that combines sampling and model-based inferences for estimating species presence. The accuracy and cost-effectiveness of this approach were compared to those of sampling surveys and predictive models for estimating the presence of the threatened bull trout ( Salvelinus confluentus ) via simulation with existing models and empirical sampling data. Simulations indicated that a sampling-only approach would be the most effective and would result in the lowest presence and absence misclassification error rates for three thresholds of detection probability. When sampling effort was considered, however, the combined approach resulted in the lowest error rates per unit of sampling effort. Hence, lower probability-of-detection thresholds can be specified with the combined approach, resulting in lower misclassification error rates and improved cost-effectiveness.
Errors in radial velocity variance from Doppler wind lidar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, H.; Barthelmie, R. J.; Doubrawa, P.

A high-fidelity lidar turbulence measurement technique relies on accurate estimates of radial velocity variance that are subject to both systematic and random errors determined by the autocorrelation function of radial velocity, the sampling rate, and the sampling duration. Our paper quantifies the effect of the volumetric averaging in lidar radial velocity measurements on the autocorrelation function and the dependence of the systematic and random errors on the sampling duration, using both statistically simulated and observed data. For current-generation scanning lidars and sampling durations of about 30 min and longer, during which the stationarity assumption is valid for atmospheric flows, themore » systematic error is negligible but the random error exceeds about 10%.« less
Errors in radial velocity variance from Doppler wind lidar

DOE PAGES

Wang, H.; Barthelmie, R. J.; Doubrawa, P.; ...

2016-08-29

A high-fidelity lidar turbulence measurement technique relies on accurate estimates of radial velocity variance that are subject to both systematic and random errors determined by the autocorrelation function of radial velocity, the sampling rate, and the sampling duration. Our paper quantifies the effect of the volumetric averaging in lidar radial velocity measurements on the autocorrelation function and the dependence of the systematic and random errors on the sampling duration, using both statistically simulated and observed data. For current-generation scanning lidars and sampling durations of about 30 min and longer, during which the stationarity assumption is valid for atmospheric flows, themore » systematic error is negligible but the random error exceeds about 10%.« less
Impact of sampling strategy on stream load estimates in till landscape of the Midwest

USGS Publications Warehouse

Vidon, P.; Hubbard, L.E.; Soyeux, E.

2009-01-01

Accurately estimating various solute loads in streams during storms is critical to accurately determine maximum daily loads for regulatory purposes. This study investigates the impact of sampling strategy on solute load estimates in streams in the US Midwest. Three different solute types (nitrate, magnesium, and dissolved organic carbon (DOC)) and three sampling strategies are assessed. Regardless of the method, the average error on nitrate loads is higher than for magnesium or DOC loads, and all three methods generally underestimate DOC loads and overestimate magnesium loads. Increasing sampling frequency only slightly improves the accuracy of solute load estimates but generally improves the precision of load calculations. This type of investigation is critical for water management and environmental assessment so error on solute load calculations can be taken into account by landscape managers, and sampling strategies optimized as a function of monitoring objectives. ?? 2008 Springer Science+Business Media B.V.
Verification of Satellite Rainfall Estimates from the Tropical Rainfall Measuring Mission over Ground Validation Sites

NASA Astrophysics Data System (ADS)

Fisher, B. L.; Wolff, D. B.; Silberstein, D. S.; Marks, D. M.; Pippitt, J. L.

2007-12-01

The Tropical Rainfall Measuring Mission's (TRMM) Ground Validation (GV) Program was originally established with the principal long-term goal of determining the random errors and systematic biases stemming from the application of the TRMM rainfall algorithms. The GV Program has been structured around two validation strategies: 1) determining the quantitative accuracy of the integrated monthly rainfall products at GV regional sites over large areas of about 500 km2 using integrated ground measurements and 2) evaluating the instantaneous satellite and GV rain rate statistics at spatio-temporal scales compatible with the satellite sensor resolution (Simpson et al. 1988, Thiele 1988). The GV Program has continued to evolve since the launch of the TRMM satellite on November 27, 1997. This presentation will discuss current GV methods of validating TRMM operational rain products in conjunction with ongoing research. The challenge facing TRMM GV has been how to best utilize rain information from the GV system to infer the random and systematic error characteristics of the satellite rain estimates. A fundamental problem of validating space-borne rain estimates is that the true mean areal rainfall is an ideal, scale-dependent parameter that cannot be directly measured. Empirical validation uses ground-based rain estimates to determine the error characteristics of the satellite-inferred rain estimates, but ground estimates also incur measurement errors and contribute to the error covariance. Furthermore, sampling errors, associated with the discrete, discontinuous temporal sampling by the rain sensors aboard the TRMM satellite, become statistically entangled in the monthly estimates. Sampling errors complicate the task of linking biases in the rain retrievals to the physics of the satellite algorithms. The TRMM Satellite Validation Office (TSVO) has made key progress towards effective satellite validation. For disentangling the sampling and retrieval errors, TSVO has developed and applied a methodology that statistically separates the two error sources. Using TRMM monthly estimates and high-resolution radar and gauge data, this method has been used to estimate sampling and retrieval error budgets over GV sites. More recently, a multi- year data set of instantaneous rain rates from the TRMM microwave imager (TMI), the precipitation radar (PR), and the combined algorithm was spatio-temporally matched and inter-compared to GV radar rain rates collected during satellite overpasses of select GV sites at the scale of the TMI footprint. The analysis provided a more direct probe of the satellite rain algorithms using ground data as an empirical reference. TSVO has also made significant advances in radar quality control through the development of the Relative Calibration Adjustment (RCA) technique. The RCA is currently being used to provide a long-term record of radar calibration for the radar at Kwajalein, a strategically important GV site in the tropical Pacific. The RCA technique has revealed previously undetected alterations in the radar sensitivity due to engineering changes (e.g., system modifications, antenna offsets, alterations of the receiver, or the data processor), making possible the correction of the radar rainfall measurements and ensuring the integrity of nearly a decade of TRMM GV observations and resources.
Prediction and standard error estimation for a finite universe total when a stratum is not sampled

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wright, T.

1994-01-01

In the context of a universe of trucks operating in the United States in 1990, this paper presents statistical methodology for estimating a finite universe total on a second occasion when a part of the universe is sampled and the remainder of the universe is not sampled. Prediction is used to compensate for the lack of data from the unsampled portion of the universe. The sample is assumed to be a subsample of an earlier sample where stratification is used on both occasions before sample selection. Accounting for births and deaths in the universe between the two points in time,more » the detailed sampling plan, estimator, standard error, and optimal sample allocation, are presented with a focus on the second occasion. If prior auxiliary information is available, the methodology is also applicable to a first occasion.« less
Quantifying Adventitious Error in a Covariance Structure as a Random Effect

PubMed Central

Wu, Hao; Browne, Michael W.

2017-01-01

We present an approach to quantifying errors in covariance structures in which adventitious error, identified as the process underlying the discrepancy between the population and the structured model, is explicitly modeled as a random effect with a distribution, and the dispersion parameter of this distribution to be estimated gives a measure of misspecification. Analytical properties of the resultant procedure are investigated and the measure of misspecification is found to be related to the RMSEA. An algorithm is developed for numerical implementation of the procedure. The consistency and asymptotic sampling distributions of the estimators are established under a new asymptotic paradigm and an assumption weaker than the standard Pitman drift assumption. Simulations validate the asymptotic sampling distributions and demonstrate the importance of accounting for the variations in the parameter estimates due to adventitious error. Two examples are also given as illustrations. PMID:25813463
Statistical theory and methodology for remote sensing data analysis

NASA Technical Reports Server (NTRS)

Odell, P. L.

1974-01-01

A model is developed for the evaluation of acreages (proportions) of different crop-types over a geographical area using a classification approach and methods for estimating the crop acreages are given. In estimating the acreages of a specific croptype such as wheat, it is suggested to treat the problem as a two-crop problem: wheat vs. nonwheat, since this simplifies the estimation problem considerably. The error analysis and the sample size problem is investigated for the two-crop approach. Certain numerical results for sample sizes are given for a JSC-ERTS-1 data example on wheat identification performance in Hill County, Montana and Burke County, North Dakota. Lastly, for a large area crop acreages inventory a sampling scheme is suggested for acquiring sample data and the problem of crop acreage estimation and the error analysis is discussed.

Validation of proton stopping power ratio estimation based on dual energy CT using fresh tissue samples

NASA Astrophysics Data System (ADS)

Taasti, Vicki T.; Michalak, Gregory J.; Hansen, David C.; Deisher, Amanda J.; Kruse, Jon J.; Krauss, Bernhard; Muren, Ludvig P.; Petersen, Jørgen B. B.; McCollough, Cynthia H.

2018-01-01

Dual energy CT (DECT) has been shown, in theoretical and phantom studies, to improve the stopping power ratio (SPR) determination used for proton treatment planning compared to the use of single energy CT (SECT). However, it has not been shown that this also extends to organic tissues. The purpose of this study was therefore to investigate the accuracy of SPR estimation for fresh pork and beef tissue samples used as surrogates of human tissues. The reference SPRs for fourteen tissue samples, which included fat, muscle and femur bone, were measured using proton pencil beams. The tissue samples were subsequently CT scanned using four different scanners with different dual energy acquisition modes, giving in total six DECT-based SPR estimations for each sample. The SPR was estimated using a proprietary algorithm (syngo.via DE Rho/Z Maps, Siemens Healthcare, Forchheim, Germany) for extracting the electron density and the effective atomic number. SECT images were also acquired and SECT-based SPR estimations were performed using a clinical Hounsfield look-up table. The mean and standard deviation of the SPR over large volume-of-interests were calculated. For the six different DECT acquisition methods, the root-mean-square errors (RMSEs) for the SPR estimates over all tissue samples were between 0.9% and 1.5%. For the SECT-based SPR estimation the RMSE was 2.8%. For one DECT acquisition method, a positive bias was seen in the SPR estimates, having a mean error of 1.3%. The largest errors were found in the very dense cortical bone from a beef femur. This study confirms the advantages of DECT-based SPR estimation although good results were also obtained using SECT for most tissues.
Estimation of distributional parameters for censored trace level water quality data: 1. Estimation techniques

USGS Publications Warehouse

Gilliom, Robert J.; Helsel, Dennis R.

1986-01-01

A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations, for determining the best performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.
Estimation of distributional parameters for censored trace level water quality data. 1. Estimation Techniques

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilliom, R.J.; Helsel, D.R.

1986-02-01

A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensoredmore » observations, for determining the best performing parameter estimation method for any particular data det. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.« less
Validation of a Sampling Method to Collect Exposure Data for Central-Line-Associated Bloodstream Infections.

PubMed

Hammami, Naïma; Mertens, Karl; Overholser, Rosanna; Goetghebeur, Els; Catry, Boudewijn; Lambert, Marie-Laurence

2016-05-01

Surveillance of central-line-associated bloodstream infections requires the labor-intensive counting of central-line days (CLDs). This workload could be reduced by sampling. Our objective was to evaluate the accuracy of various sampling strategies in the estimation of CLDs in intensive care units (ICUs) and to establish a set of rules to identify optimal sampling strategies depending on ICU characteristics. Analyses of existing data collected according to the European protocol for patient-based surveillance of ICU-acquired infections in Belgium between 2004 and 2012. CLD data were reported by 56 ICUs in 39 hospitals during 364 trimesters. We compared estimated CLD data obtained from weekly and monthly sampling schemes with the observed exhaustive CLD data over the trimester by assessing the CLD percentage error (ie, observed CLDs - estimated CLDs/observed CLDs). We identified predictors of improved accuracy using linear mixed models. When sampling once per week or 3 times per month, 80% of ICU trimesters had a CLD percentage error within 10%. When sampling twice per week, this was >90% of ICU trimesters. Sampling on Tuesdays provided the best estimations. In the linear mixed model, the observed CLD count was the best predictor for a smaller percentage error. The following sampling strategies provided an estimate within 10% of the actual CLD for 97% of the ICU trimesters with 90% confidence: 3 times per month in an ICU with >650 CLDs per trimester or each Tuesday in an ICU with >480 CLDs per trimester. Sampling of CLDs provides an acceptable alternative to daily collection of CLD data.
Estimation of Rainfall Sampling Uncertainty: A Comparison of Two Diverse Approaches

NASA Technical Reports Server (NTRS)

Steiner, Matthias; Zhang, Yu; Baeck, Mary Lynn; Wood, Eric F.; Smith, James A.; Bell, Thomas L.; Lau, William K. M. (Technical Monitor)

2002-01-01

The spatial and temporal intermittence of rainfall causes the averages of satellite observations of rain rate to differ from the "true" average rain rate over any given area and time period, even if the satellite observations are perfectly accurate. The difference of satellite averages based on occasional observation by satellite systems and the continuous-time average of rain rate is referred to as sampling error. In this study, rms sampling error estimates are obtained for average rain rates over boxes 100 km, 200 km, and 500 km on a side, for averaging periods of 1 day, 5 days, and 30 days. The study uses a multi-year, merged radar data product provided by Weather Services International Corp. at a resolution of 2 km in space and 15 min in time, over an area of the central U.S. extending from 35N to 45N in latitude and 100W to 80W in longitude. The intervals between satellite observations are assumed to be equal, and similar In size to what present and future satellite systems are able to provide (from 1 h to 12 h). The sampling error estimates are obtained using a resampling method called "resampling by shifts," and are compared to sampling error estimates proposed by Bell based on earlier work by Laughlin. The resampling estimates are found to scale with areal size and time period as the theory predicts. The dependence on average rain rate and time interval between observations is also similar to what the simple theory suggests.
Comparison of Efficiency of Jackknife and Variance Component Estimators of Standard Errors. Program Statistics Research. Technical Report.

ERIC Educational Resources Information Center

Longford, Nicholas T.

Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…
Uncertainties in the cluster-cluster correlation function

NASA Astrophysics Data System (ADS)

Ling, E. N.; Frenk, C. S.; Barrow, J. D.

1986-12-01

The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.
On using summary statistics from an external calibration sample to correct for covariate measurement error.

PubMed

Guo, Ying; Little, Roderick J; McConnell, Daniel S

2012-01-01

Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.
A pharmacometric case study regarding the sensitivity of structural model parameter estimation to error in patient reported dosing times.

PubMed

Knights, Jonathan; Rohatagi, Shashank

2015-12-01

Although there is a body of literature focused on minimizing the effect of dosing inaccuracies on pharmacokinetic (PK) parameter estimation, most of the work centers on missing doses. No attempt has been made to specifically characterize the effect of error in reported dosing times. Additionally, existing work has largely dealt with cases in which the compound of interest is dosed at an interval no less than its terminal half-life. This work provides a case study investigating how error in patient reported dosing times might affect the accuracy of structural model parameter estimation under sparse sampling conditions when the dosing interval is less than the terminal half-life of the compound, and the underlying kinetics are monoexponential. Additional effects due to noncompliance with dosing events are not explored and it is assumed that the structural model and reasonable initial estimates of the model parameters are known. Under the conditions of our simulations, with structural model CV % ranging from ~20 to 60 %, parameter estimation inaccuracy derived from error in reported dosing times was largely controlled around 10 % on average. Given that no observed dosing was included in the design and sparse sampling was utilized, we believe these error results represent a practical ceiling given the variability and parameter estimates for the one-compartment model. The findings suggest additional investigations may be of interest and are noteworthy given the inability of current PK software platforms to accommodate error in dosing times.
Lognormal kriging for the assessment of reliability in groundwater quality control observation networks

USGS Publications Warehouse

Candela, L.; Olea, R.A.; Custodio, E.

1988-01-01

Groundwater quality observation networks are examples of discontinuous sampling on variables presenting spatial continuity and highly skewed frequency distributions. Anywhere in the aquifer, lognormal kriging provides estimates of the variable being sampled and a standard error of the estimate. The average and the maximum standard error within the network can be used to dynamically improve the network sampling efficiency or find a design able to assure a given reliability level. The approach does not require the formulation of any physical model for the aquifer or any actual sampling of hypothetical configurations. A case study is presented using the network monitoring salty water intrusion into the Llobregat delta confined aquifer, Barcelona, Spain. The variable chloride concentration used to trace the intrusion exhibits sudden changes within short distances which make the standard error fairly invariable to changes in sampling pattern and to substantial fluctuations in the number of wells. ?? 1988.
Measurement variability error for estimates of volume change

Treesearch

James A. Westfall; Paul L. Patterson

2007-01-01

Using quality assurance data, measurement variability distributions were developed for attributes that affect tree volume prediction. Random deviations from the measurement variability distributions were applied to 19381 remeasured sample trees in Maine. The additional error due to measurement variation and measurement bias was estimated via a simulation study for...
Classification based upon gene expression data: bias and precision of error rates.

PubMed

Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L

2007-06-01

Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp
Estimation of genetic parameters and their sampling variances of quantitative traits in the type 2 modified augmented design

USDA-ARS?s Scientific Manuscript database

We proposed a method to estimate the error variance among non-replicated genotypes, thus to estimate the genetic parameters by using replicated controls. We derived formulas to estimate sampling variances of the genetic parameters. Computer simulation indicated that the proposed methods of estimatin...
Errors in causal inference: an organizational schema for systematic error and random error.

PubMed

Suzuki, Etsuji; Tsuda, Toshihide; Mitsuhashi, Toshiharu; Mansournia, Mohammad Ali; Yamamoto, Eiji

2016-11-01

To provide an organizational schema for systematic error and random error in estimating causal measures, aimed at clarifying the concept of errors from the perspective of causal inference. We propose to divide systematic error into structural error and analytic error. With regard to random error, our schema shows its four major sources: nondeterministic counterfactuals, sampling variability, a mechanism that generates exposure events and measurement variability. Structural error is defined from the perspective of counterfactual reasoning and divided into nonexchangeability bias (which comprises confounding bias and selection bias) and measurement bias. Directed acyclic graphs are useful to illustrate this kind of error. Nonexchangeability bias implies a lack of "exchangeability" between the selected exposed and unexposed groups. A lack of exchangeability is not a primary concern of measurement bias, justifying its separation from confounding bias and selection bias. Many forms of analytic errors result from the small-sample properties of the estimator used and vanish asymptotically. Analytic error also results from wrong (misspecified) statistical models and inappropriate statistical methods. Our organizational schema is helpful for understanding the relationship between systematic error and random error from a previously less investigated aspect, enabling us to better understand the relationship between accuracy, validity, and precision. Copyright © 2016 Elsevier Inc. All rights reserved.
Reliable estimation of orbit errors in spaceborne SAR interferometry. The network approach

NASA Astrophysics Data System (ADS)

Bähr, Hermann; Hanssen, Ramon F.

2012-12-01

An approach to improve orbital state vectors by orbit error estimates derived from residual phase patterns in synthetic aperture radar interferograms is presented. For individual interferograms, an error representation by two parameters is motivated: the baseline error in cross-range and the rate of change of the baseline error in range. For their estimation, two alternatives are proposed: a least squares approach that requires prior unwrapping and a less reliable gridsearch method handling the wrapped phase. In both cases, reliability is enhanced by mutual control of error estimates in an overdetermined network of linearly dependent interferometric combinations of images. Thus, systematic biases, e.g., due to unwrapping errors, can be detected and iteratively eliminated. Regularising the solution by a minimum-norm condition results in quasi-absolute orbit errors that refer to particular images. For the 31 images of a sample ENVISAT dataset, orbit corrections with a mutual consistency on the millimetre level have been inferred from 163 interferograms. The method itself qualifies by reliability and rigorous geometric modelling of the orbital error signal but does not consider interfering large scale deformation effects. However, a separation may be feasible in a combined processing with persistent scatterer approaches or by temporal filtering of the estimates.
Quantifying uncertainty in carbon and nutrient pools of coarse woody debris

NASA Astrophysics Data System (ADS)

See, C. R.; Campbell, J. L.; Fraver, S.; Domke, G. M.; Harmon, M. E.; Knoepp, J. D.; Woodall, C. W.

2016-12-01

Woody detritus constitutes a major pool of both carbon and nutrients in forested ecosystems. Estimating coarse wood stocks relies on many assumptions, even when full surveys are conducted. Researchers rarely report error in coarse wood pool estimates, despite the importance to ecosystem budgets and modelling efforts. To date, no study has attempted a comprehensive assessment of error rates and uncertainty inherent in the estimation of this pool. Here, we use Monte Carlo analysis to propagate the error associated with the major sources of uncertainty present in the calculation of coarse wood carbon and nutrient (i.e., N, P, K, Ca, Mg, Na) pools. We also evaluate individual sources of error to identify the importance of each source of uncertainty in our estimates. We quantify sampling error by comparing the three most common field methods used to survey coarse wood (two transect methods and a whole-plot survey). We quantify the measurement error associated with length and diameter measurement, and technician error in species identification and decay class using plots surveyed by multiple technicians. We use previously published values of model error for the four most common methods of volume estimation: Smalian's, conical frustum, conic paraboloid, and average-of-ends. We also use previously published values for error in the collapse ratio (cross-sectional height/width) of decayed logs that serves as a surrogate for the volume remaining. We consider sampling error in chemical concentration and density for all decay classes, using distributions from both published and unpublished studies. Analytical uncertainty is calculated using standard reference plant material from the National Institute of Standards. Our results suggest that technician error in decay classification can have a large effect on uncertainty, since many of the error distributions included in the calculation (e.g. density, chemical concentration, volume-model selection, collapse ratio) are decay-class specific.
Inventory and mapping of flood inundation using interactive digital image analysis techniques

USGS Publications Warehouse

Rohde, Wayne G.; Nelson, Charles A.; Taranik, J.V.

1979-01-01

LANDSAT digital data and color infra-red photographs were used in a multiphase sampling scheme to estimate the area of agricultural land affected by a flood. The LANDSAT data were classified with a maximum likelihood algorithm. Stratification of the LANDSAT data, prior to classification, greatly reduced misclassification errors. The classification results were used to prepare a map overlay showing the areal extent of flooding. These data also provided statistics required to estimate sample size in a two phase sampling scheme, and provided quick, accurate estimates of areas flooded for the first phase. The measurements made in the second phase, based on ground data and photo-interpretation, were used with two phase sampling statistics to estimate the area of agricultural land affected by flooding These results show that LANDSAT digital data can be used to prepare map overlays showing the extent of flooding on agricultural land and, with two phase sampling procedures, can provide acreage estimates with sampling errors of about 5 percent. This procedure provides a technique for rapidly assessing the areal extent of flood conditions on agricultural land and would provide a basis for designing a sampling framework to estimate the impact of flooding on crop production.
Density dependence and climate effects in Rocky Mountain elk: an application of regression with instrumental variables for population time series with sampling error.

PubMed

Creel, Scott; Creel, Michael

2009-11-01

1. Sampling error in annual estimates of population size creates two widely recognized problems for the analysis of population growth. First, if sampling error is mistakenly treated as process error, one obtains inflated estimates of the variation in true population trajectories (Staples, Taper & Dennis 2004). Second, treating sampling error as process error is thought to overestimate the importance of density dependence in population growth (Viljugrein et al. 2005; Dennis et al. 2006). 2. In ecology, state-space models are used to account for sampling error when estimating the effects of density and other variables on population growth (Staples et al. 2004; Dennis et al. 2006). In econometrics, regression with instrumental variables is a well-established method that addresses the problem of correlation between regressors and the error term, but requires fewer assumptions than state-space models (Davidson & MacKinnon 1993; Cameron & Trivedi 2005). 3. We used instrumental variables to account for sampling error and fit a generalized linear model to 472 annual observations of population size for 35 Elk Management Units in Montana, from 1928 to 2004. We compared this model with state-space models fit with the likelihood function of Dennis et al. (2006). We discuss the general advantages and disadvantages of each method. Briefly, regression with instrumental variables is valid with fewer distributional assumptions, but state-space models are more efficient when their distributional assumptions are met. 4. Both methods found that population growth was negatively related to population density and winter snow accumulation. Summer rainfall and wolf (Canis lupus) presence had much weaker effects on elk (Cervus elaphus) dynamics [though limitation by wolves is strong in some elk populations with well-established wolf populations (Creel et al. 2007; Creel & Christianson 2008)]. 5. Coupled with predictions for Montana from global and regional climate models, our results predict a substantial reduction in the limiting effect of snow accumulation on Montana elk populations in the coming decades. If other limiting factors do not operate with greater force, population growth rates would increase substantially.
Reducing sampling error in faecal egg counts from black rhinoceros (Diceros bicornis).

PubMed

Stringer, Andrew P; Smith, Diane; Kerley, Graham I H; Linklater, Wayne L

2014-04-01

Faecal egg counts (FECs) are commonly used for the non-invasive assessment of parasite load within hosts. Sources of error, however, have been identified in laboratory techniques and sample storage. Here we focus on sampling error. We test whether a delay in sample collection can affect FECs, and estimate the number of samples needed to reliably assess mean parasite abundance within a host population. Two commonly found parasite eggs in black rhinoceros (Diceros bicornis) dung, strongyle-type nematodes and Anoplocephala gigantea, were used. We find that collection of dung from the centre of faecal boluses up to six hours after defecation does not affect FECs. More than nine samples were needed to greatly improve confidence intervals of the estimated mean parasite abundance within a host population. These results should improve the cost-effectiveness and efficiency of sampling regimes, and support the usefulness of FECs when used for the non-invasive assessment of parasite abundance in black rhinoceros populations.
Youth Attitude Tracking Study II Wave 17 -- Fall 1986.

DTIC Science & Technology

1987-06-01

decision, unless so designated by other official documentation. TABLE OF CONTENTS Page PREFACE ................................................. xi...Segmentation Analyses .......................... 2-7 .3. METHODOLOGY OF YATS II....................................... 3-1 A. Sampling Design Overview...Sampling Design , Estimation Procedures and Estimated Sampling Errors ................................. A-i Appendix B: Data Collection Procedures

ESTIMATING SAMPLE REQUIREMENTS FOR FIELD EVALUATIONS OF PESTICIDE LEACHING

EPA Science Inventory

A method is presented for estimating the number of samples needed to evaluate pesticide leaching threats to ground water at a desired level of precision. Sample size projections are based on desired precision (exhibited as relative tolerable error), level of confidence (90 or 95%...
Estimation of distributional parameters for censored trace-level water-quality data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilliom, R.J.; Helsel, D.R.

1984-01-01

A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water-sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations,more » for determining the best-performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least-squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification. 6 figs., 6 tabs.« less
A method to correct sampling ghosts in historic near-infrared Fourier transform spectrometer (FTS) measurements

NASA Astrophysics Data System (ADS)

Dohe, S.; Sherlock, V.; Hase, F.; Gisi, M.; Robinson, J.; Sepúlveda, E.; Schneider, M.; Blumenstock, T.

2013-08-01

The Total Carbon Column Observing Network (TCCON) has been established to provide ground-based remote sensing measurements of the column-averaged dry air mole fractions (DMF) of key greenhouse gases. To ensure network-wide consistency, biases between Fourier transform spectrometers at different sites have to be well controlled. Errors in interferogram sampling can introduce significant biases in retrievals. In this study we investigate a two-step scheme to correct these errors. In the first step the laser sampling error (LSE) is estimated by determining the sampling shift which minimises the magnitude of the signal intensity in selected, fully absorbed regions of the solar spectrum. The LSE is estimated for every day with measurements which meet certain selection criteria to derive the site-specific time series of the LSEs. In the second step, this sequence of LSEs is used to resample all the interferograms acquired at the site, and hence correct the sampling errors. Measurements acquired at the Izaña and Lauder TCCON sites are used to demonstrate the method. At both sites the sampling error histories show changes in LSE due to instrument interventions (e.g. realignment). Estimated LSEs are in good agreement with sampling errors inferred from the ratio of primary and ghost spectral signatures in optically bandpass-limited tungsten lamp spectra acquired at Lauder. The original time series of Xair and XCO2 (XY: column-averaged DMF of the target gas Y) at both sites show discrepancies of 0.2-0.5% due to changes in the LSE associated with instrument interventions or changes in the measurement sample rate. After resampling, discrepancies are reduced to 0.1% or less at Lauder and 0.2% at Izaña. In the latter case, coincident changes in interferometer alignment may also have contributed to the residual difference. In the future the proposed method will be used to correct historical spectra at all TCCON sites.
Increasing point-count duration increases standard error

USGS Publications Warehouse

Smith, W.P.; Twedt, D.J.; Hamel, P.B.; Ford, R.P.; Wiedenfeld, D.A.; Cooper, R.J.

1998-01-01

We examined data from point counts of varying duration in bottomland forests of west Tennessee and the Mississippi Alluvial Valley to determine if counting interval influenced sampling efficiency. Estimates of standard error increased as point count duration increased both for cumulative number of individuals and species in both locations. Although point counts appear to yield data with standard errors proportional to means, a square root transformation of the data may stabilize the variance. Using long (>10 min) point counts may reduce sample size and increase sampling error, both of which diminish statistical power and thereby the ability to detect meaningful changes in avian populations.
[Practical aspects regarding sample size in clinical research].

PubMed

Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

1996-01-01

The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.
A method for estimating radioactive cesium concentrations in cattle blood using urine samples.

PubMed

Sato, Itaru; Yamagishi, Ryoma; Sasaki, Jun; Satoh, Hiroshi; Miura, Kiyoshi; Kikuchi, Kaoru; Otani, Kumiko; Okada, Keiji

2017-12-01

In the region contaminated by the Fukushima nuclear accident, radioactive contamination of live cattle should be checked before slaughter. In this study, we establish a precise method for estimating radioactive cesium concentrations in cattle blood using urine samples. Blood and urine samples were collected from a total of 71 cattle on two farms in the 'difficult-to-return zone'. Urine 137 Cs, specific gravity, electrical conductivity, pH, sodium, potassium, calcium, and creatinine were measured and various estimation methods for blood 137 Cs were tested. The average error rate of the estimation was 54.2% without correction. Correcting for urine creatinine, specific gravity, electrical conductivity, or potassium improved the precision of the estimation. Correcting for specific gravity using the following formula gave the most precise estimate (average error rate = 16.9%): [blood 137 Cs] = [urinary 137 Cs]/([specific gravity] - 1)/329. Urine samples are faster to measure than blood samples because urine can be obtained in larger quantities and has a higher 137 Cs concentration than blood. These advantages of urine and the estimation precision demonstrated in our study, indicate that estimation of blood 137 Cs using urine samples is a practical means of monitoring radioactive contamination in live cattle. © 2017 Japanese Society of Animal Science.
Maximum inflation of the type 1 error rate when sample size and allocation rate are adapted in a pre-planned interim look.

PubMed

Graf, Alexandra C; Bauer, Peter

2011-06-30

We calculate the maximum type 1 error rate of the pre-planned conventional fixed sample size test for comparing the means of independent normal distributions (with common known variance) which can be yielded when sample size and allocation rate to the treatment arms can be modified in an interim analysis. Thereby it is assumed that the experimenter fully exploits knowledge of the unblinded interim estimates of the treatment effects in order to maximize the conditional type 1 error rate. The 'worst-case' strategies require knowledge of the unknown common treatment effect under the null hypothesis. Although this is a rather hypothetical scenario it may be approached in practice when using a standard control treatment for which precise estimates are available from historical data. The maximum inflation of the type 1 error rate is substantially larger than derived by Proschan and Hunsberger (Biometrics 1995; 51:1315-1324) for design modifications applying balanced samples before and after the interim analysis. Corresponding upper limits for the maximum type 1 error rate are calculated for a number of situations arising from practical considerations (e.g. restricting the maximum sample size, not allowing sample size to decrease, allowing only increase in the sample size in the experimental treatment). The application is discussed for a motivating example. Copyright © 2011 John Wiley & Sons, Ltd.
Accuracy and sampling error of two age estimation techniques using rib histomorphometry on a modern sample.

PubMed

García-Donas, Julieta G; Dyke, Jeffrey; Paine, Robert R; Nathena, Despoina; Kranioti, Elena F

2016-02-01

Most age estimation methods are proven problematic when applied in highly fragmented skeletal remains. Rib histomorphometry is advantageous in such cases; yet it is vital to test and revise existing techniques particularly when used in legal settings (Crowder and Rosella, 2007). This study tested Stout & Paine (1992) and Stout et al. (1994) histological age estimation methods on a Modern Greek sample using different sampling sites. Six left 4th ribs of known age and sex were selected from a modern skeletal collection. Each rib was cut into three equal segments. Two thin sections were acquired from each segment. A total of 36 thin sections were prepared and analysed. Four variables (cortical area, intact and fragmented osteon density and osteon population density) were calculated for each section and age was estimated according to Stout & Paine (1992) and Stout et al. (1994). The results showed that both methods produced a systemic underestimation of the individuals (to a maximum of 43 years) although a general improvement in accuracy levels was observed when applying the Stout et al. (1994) formula. There is an increase of error rates with increasing age with the oldest individual showing extreme differences between real age and estimated age. Comparison of the different sampling sites showed small differences between the estimated ages suggesting that any fragment of the rib could be used without introducing significant error. Yet, a larger sample should be used to confirm these results. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Sensor Analytics: Radioactive gas Concentration Estimation and Error Propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Dale N.; Fagan, Deborah K.; Suarez, Reynold

2007-04-15

This paper develops the mathematical statistics of a radioactive gas quantity measurement and associated error propagation. The probabilistic development is a different approach to deriving attenuation equations and offers easy extensions to more complex gas analysis components through simulation. The mathematical development assumes a sequential process of three components; I) the collection of an environmental sample, II) component gas extraction from the sample through the application of gas separation chemistry, and III) the estimation of radioactivity of component gases.
A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification.

PubMed

Jiang, Wenyu; Simon, Richard

2007-12-20

This paper first provides a critical review on some existing methods for estimating the prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimens. Special attention is given to the bootstrap-related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave-one-out bootstrap (RLOOB) method that predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap (ABS) method that fits a learning curve to the RLOOB estimates calculated with different bootstrap learning set sizes. The ABS method is robust across the situations we investigate and provides a slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave-one-out bootstrap and the 0.632+ bootstrap, and it does not suffer from large variability as the leave-one-out cross-validation in microarray applications. Copyright (c) 2007 John Wiley & Sons, Ltd.
Inventory implications of using sampling variances in estimation of growth model coefficients

Treesearch

Albert R. Stage; William R. Wykoff

2000-01-01

Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...
Sampling errors for satellite-derived tropical rainfall - Monte Carlo study using a space-time stochastic model

NASA Technical Reports Server (NTRS)

Bell, Thomas L.; Abdullah, A.; Martin, Russell L.; North, Gerald R.

1990-01-01

Estimates of monthly average rainfall based on satellite observations from a low earth orbit will differ from the true monthly average because the satellite observes a given area only intermittently. This sampling error inherent in satellite monitoring of rainfall would occur even if the satellite instruments could measure rainfall perfectly. The size of this error is estimated for a satellite system being studied at NASA, the Tropical Rainfall Measuring Mission (TRMM). First, the statistical description of rainfall on scales from 1 to 1000 km is examined in detail, based on rainfall data from the Global Atmospheric Research Project Atlantic Tropical Experiment (GATE). A TRMM-like satellite is flown over a two-dimensional time-evolving simulation of rainfall using a stochastic model with statistics tuned to agree with GATE statistics. The distribution of sampling errors found from many months of simulated observations is found to be nearly normal, even though the distribution of area-averaged rainfall is far from normal. For a range of orbits likely to be employed in TRMM, sampling error is found to be less than 10 percent of the mean for rainfall averaged over a 500 x 500 sq km area.
Comparison of Optimal Design Methods in Inverse Problems

PubMed Central

Banks, H. T.; Holm, Kathleen; Kappel, Franz

2011-01-01

Typical optimal design methods for inverse or parameter estimation problems are designed to choose optimal sampling distributions through minimization of a specific cost function related to the resulting error in parameter estimates. It is hoped that the inverse problem will produce parameter estimates with increased accuracy using data collected according to the optimal sampling distribution. Here we formulate the classical optimal design problem in the context of general optimization problems over distributions of sampling times. We present a new Prohorov metric based theoretical framework that permits one to treat succinctly and rigorously any optimal design criteria based on the Fisher Information Matrix (FIM). A fundamental approximation theory is also included in this framework. A new optimal design, SE-optimal design (standard error optimal design), is then introduced in the context of this framework. We compare this new design criteria with the more traditional D-optimal and E-optimal designs. The optimal sampling distributions from each design are used to compute and compare standard errors; the standard errors for parameters are computed using asymptotic theory or bootstrapping and the optimal mesh. We use three examples to illustrate ideas: the Verhulst-Pearl logistic population model [13], the standard harmonic oscillator model [13] and a popular glucose regulation model [16, 19, 29]. PMID:21857762
Joint nonparametric correction estimator for excess relative risk regression in survival analysis with exposure measurement error

PubMed Central

Wang, Ching-Yun; Cullings, Harry; Song, Xiao; Kopecky, Kenneth J.

2017-01-01

SUMMARY Observational epidemiological studies often confront the problem of estimating exposure-disease relationships when the exposure is not measured exactly. In the paper, we investigate exposure measurement error in excess relative risk regression, which is a widely used model in radiation exposure effect research. In the study cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies a generalized version of the classical additive measurement error model, but it may or may not have repeated measurements. In addition, an instrumental variable is available for individuals in a subset of the whole cohort. We develop a nonparametric correction (NPC) estimator using data from the subcohort, and further propose a joint nonparametric correction (JNPC) estimator using all observed data to adjust for exposure measurement error. An optimal linear combination estimator of JNPC and NPC is further developed. The proposed estimators are nonparametric, which are consistent without imposing a covariate or error distribution, and are robust to heteroscedastic errors. Finite sample performance is examined via a simulation study. We apply the developed methods to data from the Radiation Effects Research Foundation, in which chromosome aberration is used to adjust for the effects of radiation dose measurement error on the estimation of radiation dose responses. PMID:29354018
Analyzing self-controlled case series data when case confirmation rates are estimated from an internal validation sample.

PubMed

Xu, Stanley; Clarke, Christina L; Newcomer, Sophia R; Daley, Matthew F; Glanz, Jason M

2018-05-16

Vaccine safety studies are often electronic health record (EHR)-based observational studies. These studies often face significant methodological challenges, including confounding and misclassification of adverse event. Vaccine safety researchers use self-controlled case series (SCCS) study design to handle confounding effect and employ medical chart review to ascertain cases that are identified using EHR data. However, for common adverse events, limited resources often make it impossible to adjudicate all adverse events observed in electronic data. In this paper, we considered four approaches for analyzing SCCS data with confirmation rates estimated from an internal validation sample: (1) observed cases, (2) confirmed cases only, (3) known confirmation rate, and (4) multiple imputation (MI). We conducted a simulation study to evaluate these four approaches using type I error rates, percent bias, and empirical power. Our simulation results suggest that when misclassification of adverse events is present, approaches such as observed cases, confirmed case only, and known confirmation rate may inflate the type I error, yield biased point estimates, and affect statistical power. The multiple imputation approach considers the uncertainty of estimated confirmation rates from an internal validation sample, yields a proper type I error rate, largely unbiased point estimate, proper variance estimate, and statistical power. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Estimation of population mean under systematic sampling

NASA Astrophysics Data System (ADS)

Noor-ul-amin, Muhammad; Javaid, Amjad

2017-11-01

In this study we propose a generalized ratio estimator under non-response for systematic random sampling. We also generate a class of estimators through special cases of generalized estimator using different combinations of coefficients of correlation, kurtosis and variation. The mean square errors and mathematical conditions are also derived to prove the efficiency of proposed estimators. Numerical illustration is included using three populations to support the results.
Statistical approaches to account for false-positive errors in environmental DNA samples.

PubMed

Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid

2016-05-01

Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies. © 2015 John Wiley & Sons Ltd.
Grouping methods for estimating the prevalences of rare traits from complex survey data that preserve confidentiality of respondents.

PubMed

Hyun, Noorie; Gastwirth, Joseph L; Graubard, Barry I

2018-03-26

Originally, 2-stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1-stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1-stage group testing to surveys with sample weighted complex multistage-cluster designs. Sample weighted-generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests. Two difficulties arise when using group testing in complex samples: (1) How does one weight the results of the test on each group as the sample weights will differ among observations in the same group. Furthermore, if the sample weights are related to positivity of the diagnostic test, then group-level weighting is needed to reduce bias in the prevalence estimation; (2) How does one form groups that will allow accurate estimation of the standard errors of prevalence estimates under multistage-cluster sampling allowing for intracluster correlation of the test results. We study 5 different grouping methods to address the weighting and cluster sampling aspects of complex designed samples. Finite sample properties of the estimators of prevalences, variances, and confidence interval coverage for these grouping methods are studied using simulations. National Health and Nutrition Examination Survey data are used to illustrate the methods. Copyright © 2018 John Wiley & Sons, Ltd.
Evaluating mixed samples as a source of error in non-invasive genetic studies using microsatellites

USGS Publications Warehouse

Roon, David A.; Thomas, M.E.; Kendall, K.C.; Waits, L.P.

2005-01-01

The use of noninvasive genetic sampling (NGS) for surveying wild populations is increasing rapidly. Currently, only a limited number of studies have evaluated potential biases associated with NGS. This paper evaluates the potential errors associated with analysing mixed samples drawn from multiple animals. Most NGS studies assume that mixed samples will be identified and removed during the genotyping process. We evaluated this assumption by creating 128 mixed samples of extracted DNA from brown bear (Ursus arctos) hair samples. These mixed samples were genotyped and screened for errors at six microsatellite loci according to protocols consistent with those used in other NGS studies. Five mixed samples produced acceptable genotypes after the first screening. However, all mixed samples produced multiple alleles at one or more loci, amplified as only one of the source samples, or yielded inconsistent electropherograms by the final stage of the error-checking process. These processes could potentially reduce the number of individuals observed in NGS studies, but errors should be conservative within demographic estimates. Researchers should be aware of the potential for mixed samples and carefully design gel analysis criteria and error checking protocols to detect mixed samples.
Distortion correction of echo planar images applying the concept of finite rate of innovation to point spread function mapping (FRIP).

PubMed

Nunes, Rita G; Hajnal, Joseph V

2018-06-01

Point spread function (PSF) mapping enables estimating the displacement fields required for distortion correction of echo planar images. Recently, a highly accelerated approach was introduced for estimating displacements from the phase slope of under-sampled PSF mapping data. Sampling schemes with varying spacing were proposed requiring stepwise phase unwrapping. To avoid unwrapping errors, an alternative approach applying the concept of finite rate of innovation to PSF mapping (FRIP) is introduced, using a pattern search strategy to locate the PSF peak, and the two methods are compared. Fully sampled PSF data was acquired in six subjects at 3.0 T, and distortion maps were estimated after retrospective under-sampling. The two methods were compared for both previously published and newly optimized sampling patterns. Prospectively under-sampled data were also acquired. Shift maps were estimated and deviations relative to the fully sampled reference map were calculated. The best performance was achieved when using FRIP with a previously proposed sampling scheme. The two methods were comparable for the remaining schemes. The displacement field errors tended to be lower as the number of samples or their spacing increased. A robust method for estimating the position of the PSF peak has been introduced.

Estimation in a discrete tail rate family of recapture sampling models

NASA Technical Reports Server (NTRS)

Gupta, Rajan; Lee, Larry D.

1990-01-01

In the context of recapture sampling design for debugging experiments the problem of estimating the error or hitting rate of the faults remaining in a system is considered. Moment estimators are derived for a family of models in which the rate parameters are assumed proportional to the tail probabilities of a discrete distribution on the positive integers. The estimators are shown to be asymptotically normal and fully efficient. Their fixed sample properties are compared, through simulation, with those of the conditional maximum likelihood estimators.
Adventures in Uncertainty: An Empirical Investigation of the Use of a Taylor's Series Approximation for the Assessment of Sampling Errors in Educational Research.

ERIC Educational Resources Information Center

Wilson, Mark

This study investigates the accuracy of the Woodruff-Causey technique for estimating sampling errors for complex statistics. The technique may be applied when data are collected by using multistage clustered samples. The technique was chosen for study because of its relevance to the correct use of multivariate analyses in educational survey…
Strengths and weaknesses of temporal stability analysis for monitoring and estimating grid-mean soil moisture in a high-intensity irrigated agricultural landscape

NASA Astrophysics Data System (ADS)

Ran, Youhua; Li, Xin; Jin, Rui; Kang, Jian; Cosh, Michael H.

2017-01-01

Monitoring and estimating grid-mean soil moisture is very important for assessing many hydrological, biological, and biogeochemical processes and for validating remotely sensed surface soil moisture products. Temporal stability analysis (TSA) is a valuable tool for identifying a small number of representative sampling points to estimate the grid-mean soil moisture content. This analysis was evaluated and improved using high-quality surface soil moisture data that were acquired by a wireless sensor network in a high-intensity irrigated agricultural landscape in an arid region of northwestern China. The performance of the TSA was limited in areas where the representative error was dominated by random events, such as irrigation events. This shortcoming can be effectively mitigated by using a stratified TSA (STSA) method, proposed in this paper. In addition, the following methods were proposed for rapidly and efficiently identifying representative sampling points when using TSA. (1) Instantaneous measurements can be used to identify representative sampling points to some extent; however, the error resulting from this method is significant when validating remotely sensed soil moisture products. Thus, additional representative sampling points should be considered to reduce this error. (2) The calibration period can be determined from the time span of the full range of the grid-mean soil moisture content during the monitoring period. (3) The representative error is sensitive to the number of calibration sampling points, especially when only a few representative sampling points are used. Multiple sampling points are recommended to reduce data loss and improve the likelihood of representativeness at two scales.
Sample Size Calculations for Population Size Estimation Studies Using Multiplier Methods With Respondent-Driven Sampling Surveys.

PubMed

Fearon, Elizabeth; Chabata, Sungai T; Thompson, Jennifer A; Cowan, Frances M; Hargreaves, James R

2017-09-14

While guidance exists for obtaining population size estimates using multiplier methods with respondent-driven sampling surveys, we lack specific guidance for making sample size decisions. To guide the design of multiplier method population size estimation studies using respondent-driven sampling surveys to reduce the random error around the estimate obtained. The population size estimate is obtained by dividing the number of individuals receiving a service or the number of unique objects distributed (M) by the proportion of individuals in a representative survey who report receipt of the service or object (P). We have developed an approach to sample size calculation, interpreting methods to estimate the variance around estimates obtained using multiplier methods in conjunction with research into design effects and respondent-driven sampling. We describe an application to estimate the number of female sex workers in Harare, Zimbabwe. There is high variance in estimates. Random error around the size estimate reflects uncertainty from M and P, particularly when the estimate of P in the respondent-driven sampling survey is low. As expected, sample size requirements are higher when the design effect of the survey is assumed to be greater. We suggest a method for investigating the effects of sample size on the precision of a population size estimate obtained using multipler methods and respondent-driven sampling. Uncertainty in the size estimate is high, particularly when P is small, so balancing against other potential sources of bias, we advise researchers to consider longer service attendance reference periods and to distribute more unique objects, which is likely to result in a higher estimate of P in the respondent-driven sampling survey. ©Elizabeth Fearon, Sungai T Chabata, Jennifer A Thompson, Frances M Cowan, James R Hargreaves. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 14.09.2017.
Skylab water balance error analysis

NASA Technical Reports Server (NTRS)

Leonard, J. I.

1977-01-01

Estimates of the precision of the net water balance were obtained for the entire Skylab preflight and inflight phases as well as for the first two weeks of flight. Quantitative estimates of both total sampling errors and instrumentation errors were obtained. It was shown that measurement error is minimal in comparison to biological variability and little can be gained from improvement in analytical accuracy. In addition, a propagation of error analysis demonstrated that total water balance error could be accounted for almost entirely by the errors associated with body mass changes. Errors due to interaction between terms in the water balance equation (covariances) represented less than 10% of the total error. Overall, the analysis provides evidence that daily measurements of body water changes obtained from the indirect balance technique are reasonable, precise, and relaible. The method is not biased toward net retention or loss.
Application of Lamendin's adult dental aging technique to a diverse skeletal sample.

PubMed

Prince, Debra A; Ubelaker, Douglas H

2002-01-01

Lamendin et al. (1) proposed a technique to estimate age at death for adults by analyzing single-rooted teeth. They expressed age as a function of two factors: translucency of the tooth root and periodontosis (gingival regression). In their study, they analyzed 306 singled rooted teeth that were extracted at autopsy from 208 individuals of known age at death, all of whom were considered as having a French ancestry. Their sample consisted of 135 males, 73 females, 198 whites, and 10 blacks. The sample ranged in age from 22 to 90 years of age. By using a simple formulae (A = 0.18 x P + 0.42 x T + 25.53, where A = Age in years, P = Periodontosis height x 100/root height, and T = Transparency height x 100/root height), Lamendin et al. were able to estimate age at death with a mean error of +/- 10 years on their working sample and +/- 8.4 years on a forensic control sample. Lamendin found this technique to work well with a French population, but did not test it outside of that sample area. This study tests the accuracy of this adult aging technique on a more diverse skeletal population, the Terry Collection housed at the Smithsonian's National Museum of Natural History. Our sample consists of 400 teeth from 94 black females, 72 white females, 98 black males, and 95 white males, ranging from 25 to 99 years. Lamendin's technique was applied to this sample to test its applicability to a population not of French origin. Providing results from a diverse skeletal population will aid in establishing the validity of this method to be used in forensic cases, its ideal purpose. Our results suggest that Lamendin's method estimates age fairly accurately outside of the French sample yielding a mean error of 8.2 years, standard deviation 6.9 years, and standard error of the mean 0.34 years. In addition, when ancestry and sex are accounted for, the mean errors are reduced for each group (black females, white females, black males, and white males). Lamendin et al. reported an inter-observer error of 9+/-1.8 and 10+/-2 sears from two independent observers. Forty teeth were randomly remeasured from the Terry Collection in order to assess an intra-observer error. From this retest, an intra-observer error of 6.5 years was detected.
Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality.

PubMed

Bishara, Anthony J; Hittner, James B

2015-10-01

It is more common for educational and psychological data to be nonnormal than to be approximately normal. This tendency may lead to bias and error in point estimates of the Pearson correlation coefficient. In a series of Monte Carlo simulations, the Pearson correlation was examined under conditions of normal and nonnormal data, and it was compared with its major alternatives, including the Spearman rank-order correlation, the bootstrap estimate, the Box-Cox transformation family, and a general normalizing transformation (i.e., rankit), as well as to various bias adjustments. Nonnormality caused the correlation coefficient to be inflated by up to +.14, particularly when the nonnormality involved heavy-tailed distributions. Traditional bias adjustments worsened this problem, further inflating the estimate. The Spearman and rankit correlations eliminated this inflation and provided conservative estimates. Rankit also minimized random error for most sample sizes, except for the smallest samples ( n = 10), where bootstrapping was more effective. Overall, results justify the use of carefully chosen alternatives to the Pearson correlation when normality is violated.
Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality

PubMed Central

Hittner, James B.

2014-01-01

It is more common for educational and psychological data to be nonnormal than to be approximately normal. This tendency may lead to bias and error in point estimates of the Pearson correlation coefficient. In a series of Monte Carlo simulations, the Pearson correlation was examined under conditions of normal and nonnormal data, and it was compared with its major alternatives, including the Spearman rank-order correlation, the bootstrap estimate, the Box–Cox transformation family, and a general normalizing transformation (i.e., rankit), as well as to various bias adjustments. Nonnormality caused the correlation coefficient to be inflated by up to +.14, particularly when the nonnormality involved heavy-tailed distributions. Traditional bias adjustments worsened this problem, further inflating the estimate. The Spearman and rankit correlations eliminated this inflation and provided conservative estimates. Rankit also minimized random error for most sample sizes, except for the smallest samples (n = 10), where bootstrapping was more effective. Overall, results justify the use of carefully chosen alternatives to the Pearson correlation when normality is violated. PMID:29795841
Achieving Accuracy Requirements for Forest Biomass Mapping: A Data Fusion Method for Estimating Forest Biomass and LiDAR Sampling Error with Spaceborne Data

NASA Technical Reports Server (NTRS)

Montesano, P. M.; Cook, B. D.; Sun, G.; Simard, M.; Zhang, Z.; Nelson, R. F.; Ranson, K. J.; Lutchke, S.; Blair, J. B.

2012-01-01

The synergistic use of active and passive remote sensing (i.e., data fusion) demonstrates the ability of spaceborne light detection and ranging (LiDAR), synthetic aperture radar (SAR) and multispectral imagery for achieving the accuracy requirements of a global forest biomass mapping mission. This data fusion approach also provides a means to extend 3D information from discrete spaceborne LiDAR measurements of forest structure across scales much larger than that of the LiDAR footprint. For estimating biomass, these measurements mix a number of errors including those associated with LiDAR footprint sampling over regional - global extents. A general framework for mapping above ground live forest biomass (AGB) with a data fusion approach is presented and verified using data from NASA field campaigns near Howland, ME, USA, to assess AGB and LiDAR sampling errors across a regionally representative landscape. We combined SAR and Landsat-derived optical (passive optical) image data to identify forest patches, and used image and simulated spaceborne LiDAR data to compute AGB and estimate LiDAR sampling error for forest patches and 100m, 250m, 500m, and 1km grid cells. Forest patches were delineated with Landsat-derived data and airborne SAR imagery, and simulated spaceborne LiDAR (SSL) data were derived from orbit and cloud cover simulations and airborne data from NASA's Laser Vegetation Imaging Sensor (L VIS). At both the patch and grid scales, we evaluated differences in AGB estimation and sampling error from the combined use of LiDAR with both SAR and passive optical and with either SAR or passive optical alone. This data fusion approach demonstrates that incorporating forest patches into the AGB mapping framework can provide sub-grid forest information for coarser grid-level AGB reporting, and that combining simulated spaceborne LiDAR with SAR and passive optical data are most useful for estimating AGB when measurements from LiDAR are limited because they minimized forest AGB sampling errors by 15 - 38%. Furthermore, spaceborne global scale accuracy requirements were achieved. At least 80% of the grid cells at 100m, 250m, 500m, and 1km grid levels met AGB density accuracy requirements using a combination of passive optical and SAR along with machine learning methods to predict vegetation structure metrics for forested areas without LiDAR samples. Finally, using either passive optical or SAR, accuracy requirements were met at the 500m and 250m grid level, respectively.
A fully redundant double difference algorithm for obtaining minimum variance estimates from GPS observations

NASA Technical Reports Server (NTRS)

Melbourne, William G.

1986-01-01

In double differencing a regression system obtained from concurrent Global Positioning System (GPS) observation sequences, one either undersamples the system to avoid introducing colored measurement statistics, or one fully samples the system incurring the resulting non-diagonal covariance matrix for the differenced measurement errors. A suboptimal estimation result will be obtained in the undersampling case and will also be obtained in the fully sampled case unless the color noise statistics are taken into account. The latter approach requires a least squares weighting matrix derived from inversion of a non-diagonal covariance matrix for the differenced measurement errors instead of inversion of the customary diagonal one associated with white noise processes. Presented is the so-called fully redundant double differencing algorithm for generating a weighted double differenced regression system that yields equivalent estimation results, but features for certain cases a diagonal weighting matrix even though the differenced measurement error statistics are highly colored.
Analysis of spatial correlation in predictive models of forest variables that use LiDAR auxiliary information

Treesearch

F. Mauro; Vicente J. Monleon; H. Temesgen; L.A. Ruiz

2017-01-01

Accounting for spatial correlation of LiDAR model errors can improve the precision of model-based estimators. To estimate spatial correlation, sample designs that provide close observations are needed, but their implementation might be prohibitively expensive. To quantify the gains obtained by accounting for the spatial correlation of model errors, we examined (
Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

ERIC Educational Resources Information Center

Cui, Zhongmin; Kolen, Michael J.

2008-01-01

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

ERIC Educational Resources Information Center

Li, Deping; Oranje, Andreas

2007-01-01

Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Machine learning approaches for estimation of prediction interval for the model output.

PubMed

Shrestha, Durga L; Solomatine, Dimitri P

2006-03-01

A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
Evaluation of sampling methods used to estimate irrigation pumpage in Chase, Dundy, and Perkins counties, Nebraska

USGS Publications Warehouse

Heimes, F.J.; Luckey, R.R.; Stephens, D.M.

1986-01-01

Combining estimates of applied irrigation water, determined for selected sample sites, with information on irrigated acreage provides one alternative for developing areal estimates of groundwater pumpage for irrigation. The reliability of this approach was evaluated by comparing estimated pumpage with metered pumpage for two years for a three-county area in southwestern Nebraska. Meters on all irrigation wells in the three counties provided a complete data set for evaluation of equipment and comparison with pumpage estimates. Regression analyses were conducted on discharge, time-of-operation, and pumpage data collected at 52 irrigation sites in 1983 and at 57 irrigation sites in 1984 using data from inline flowmeters as the independent variable. The standard error of the estimate for regression analysis of discharge measurements made using a portable flowmeter was 6.8% of the mean discharge metered by inline flowmeters. The standard error of the estimate for regression analysis of time of operation determined from electric meters was 8.1% of the mean time of operation determined from in-line and 15.1% for engine-hour meters. Sampled pumpage, calculated by multiplying the average discharge obtained from the portable flowmeter by the time of operation obtained from energy or hour meters, was compared with metered pumpage from in-line flowmeters at sample sites. The standard error of the estimate for the regression analysis of sampled pumpage was 10.3% of the mean of the metered pumpage for 1983 and 1984 combined. The difference in the mean of the sampled pumpage and the mean of the metered pumpage was only 1.8% for 1983 and 2.3% for 1984. Estimated pumpage, for each county and for the study area, was calculated by multiplying application (sampled pumpage divided by irrigated acreages at sample sites) by irrigated acreage compiled from Landsat (Land satellite) imagery. Estimated pumpage was compared with total metered pumpage for each county and the study area. Estimated pumpage by county varied from 9% less, to 20% more, than metered pumpage in 1983 and from 0 to 15% more than metered pumpage in 1984. Estimated pumpage for the study area was 11 % more than metered pumpage in 1983 and 5% more than metered pumpage in 1984. (Author 's abstract)
Flux Sampling Errors for Aircraft and Towers

NASA Technical Reports Server (NTRS)

Mahrt, Larry

1998-01-01

Various errors and influences leading to differences between tower- and aircraft-measured fluxes are surveyed. This survey is motivated by reports in the literature that aircraft fluxes are sometimes smaller than tower-measured fluxes. Both tower and aircraft flux errors are larger with surface heterogeneity due to several independent effects. Surface heterogeneity may cause tower flux errors to increase with decreasing wind speed. Techniques to assess flux sampling error are reviewed. Such error estimates suffer various degrees of inapplicability in real geophysical time series due to nonstationarity of tower time series (or inhomogeneity of aircraft data). A new measure for nonstationarity is developed that eliminates assumptions on the form of the nonstationarity inherent in previous methods. When this nonstationarity measure becomes large, the surface energy imbalance increases sharply. Finally, strategies for obtaining adequate flux sampling using repeated aircraft passes and grid patterns are outlined.
High variability in strain estimation errors when using a commercial ultrasound speckle tracking algorithm on tendon tissue.

PubMed

Fröberg, Åsa; Mårtensson, Mattias; Larsson, Matilda; Janerot-Sjöberg, Birgitta; D'Hooge, Jan; Arndt, Anton

2016-10-01

Ultrasound speckle tracking offers a non-invasive way of studying strain in the free Achilles tendon where no anatomical landmarks are available for tracking. This provides new possibilities for studying injury mechanisms during sport activity and the effects of shoes, orthotic devices, and rehabilitation protocols on tendon biomechanics. To investigate the feasibility of using a commercial ultrasound speckle tracking algorithm for assessing strain in tendon tissue. A polyvinyl alcohol (PVA) phantom, three porcine tendons, and a human Achilles tendon were mounted in a materials testing machine and loaded to 4% peak strain. Ultrasound long-axis cine-loops of the samples were recorded. Speckle tracking analysis of axial strain was performed using a commercial speckle tracking software. Estimated strain was then compared to reference strain known from the materials testing machine. Two frame rates and two region of interest (ROI) sizes were evaluated. Best agreement between estimated strain and reference strain was found in the PVA phantom (absolute error in peak strain: 0.21 ± 0.08%). The absolute error in peak strain varied between 0.72 ± 0.65% and 10.64 ± 3.40% in the different tendon samples. Strain determined with a frame rate of 39.4 Hz had lower errors than 78.6 Hz as was the case with a 22 mm compared to an 11 mm ROI. Errors in peak strain estimation showed high variability between tendon samples and were large in relation to strain levels previously described in the Achilles tendon. © The Foundation Acta Radiologica 2016.
SPECIAL SESSION: (H21) on Global Precipitation Mission for Hydrology and Hydrometeorology. Sampling-Error Considerations for GPM-Era Rainfall Products

NASA Technical Reports Server (NTRS)

Bell, Thomas L.; Lau, William K. M. (Technical Monitor)

2002-01-01

The proposed Global Precipitation Mission (GPM) builds on the success of the Tropical Rainfall Measuring Mission (TRMM), offering a constellation of microwave-sensor-equipped smaller satellites in addition to a larger, multiply-instrumented "mother" satellite that will include an improved precipitation radar system to which the precipitation estimates of the smaller satellites can be tuned. Coverage by the satellites will be nearly global rather than being confined as TRMM was to lower latitudes. It is hoped that the satellite constellation can provide observations at most places on the earth at least once every three hours, though practical considerations may force some compromises. The GPM system offers the possibility of providing precipitation maps with much better time resolution than the monthly averages around which TRMM was planned, and therefore opens up new possibilities for hydrology and data assimilation into models. In this talk, methods that were developed for estimating sampling error in the rainfall averages that TRMM is providing will be used to estimate sampling error levels for GPM-era configurations. Possible impacts on GPM products of compromises in the sampling frequency will be discussed.
Two-step estimation in ratio-of-mediator-probability weighted causal mediation analysis.

PubMed

Bein, Edward; Deutsch, Jonah; Hong, Guanglei; Porter, Kristin E; Qin, Xu; Yang, Cheng

2018-04-15

This study investigates appropriate estimation of estimator variability in the context of causal mediation analysis that employs propensity score-based weighting. Such an analysis decomposes the total effect of a treatment on the outcome into an indirect effect transmitted through a focal mediator and a direct effect bypassing the mediator. Ratio-of-mediator-probability weighting estimates these causal effects by adjusting for the confounding impact of a large number of pretreatment covariates through propensity score-based weighting. In step 1, a propensity score model is estimated. In step 2, the causal effects of interest are estimated using weights derived from the prior step's regression coefficient estimates. Statistical inferences obtained from this 2-step estimation procedure are potentially problematic if the estimated standard errors of the causal effect estimates do not reflect the sampling uncertainty in the estimation of the weights. This study extends to ratio-of-mediator-probability weighting analysis a solution to the 2-step estimation problem by stacking the score functions from both steps. We derive the asymptotic variance-covariance matrix for the indirect effect and direct effect 2-step estimators, provide simulation results, and illustrate with an application study. Our simulation results indicate that the sampling uncertainty in the estimated weights should not be ignored. The standard error estimation using the stacking procedure offers a viable alternative to bootstrap standard error estimation. We discuss broad implications of this approach for causal analysis involving propensity score-based weighting. Copyright © 2018 John Wiley & Sons, Ltd.
An internal pilot design for prospective cancer screening trials with unknown disease prevalence.

PubMed

Brinton, John T; Ringham, Brandy M; Glueck, Deborah H

2015-10-13

For studies that compare the diagnostic accuracy of two screening tests, the sample size depends on the prevalence of disease in the study population, and on the variance of the outcome. Both parameters may be unknown during the design stage, which makes finding an accurate sample size difficult. To solve this problem, we propose adapting an internal pilot design. In this adapted design, researchers will accrue some percentage of the planned sample size, then estimate both the disease prevalence and the variances of the screening tests. The updated estimates of the disease prevalence and variance are used to conduct a more accurate power and sample size calculation. We demonstrate that in large samples, the adapted internal pilot design produces no Type I inflation. For small samples (N less than 50), we introduce a novel adjustment of the critical value to control the Type I error rate. We apply the method to two proposed prospective cancer screening studies: 1) a small oral cancer screening study in individuals with Fanconi anemia and 2) a large oral cancer screening trial. Conducting an internal pilot study without adjusting the critical value can cause Type I error rate inflation in small samples, but not in large samples. An internal pilot approach usually achieves goal power and, for most studies with sample size greater than 50, requires no Type I error correction. Further, we have provided a flexible and accurate approach to bound Type I error below a goal level for studies with small sample size.

Magnitude error bounds for sampled-data frequency response obtained from the truncation of an infinite series, and compensator improvement program

NASA Technical Reports Server (NTRS)

Mitchell, J. R.

1972-01-01

The frequency response method of analyzing control system performance is discussed, and the difficulty of obtaining the sampled frequency response of the continuous system is considered. An upper bound magnitude error equation is obtained which yields reasonable estimates of the actual error. Finalization of the compensator improvement program is also reported, and the program was used to design compensators for Saturn 5/S1-C dry workshop and Saturn 5/S1-C Skylab.
Selected Oral Health Indicators in the United States, 2005-2008

MedlinePlus

... errors of the percentages were estimated using Taylor series linearization, to take into account the complex sampling design. The statistical significance of differences between estimates were ...
Unifying error structures in commonly used biotracer mixing models.

PubMed

Stock, Brian C; Semmens, Brice X

2016-10-01

Mixing models are statistical tools that use biotracers to probabilistically estimate the contribution of multiple sources to a mixture. These biotracers may include contaminants, fatty acids, or stable isotopes, the latter of which are widely used in trophic ecology to estimate the mixed diet of consumers. Bayesian implementations of mixing models using stable isotopes (e.g., MixSIR, SIAR) are regularly used by ecologists for this purpose, but basic questions remain about when each is most appropriate. In this study, we describe the structural differences between common mixing model error formulations in terms of their assumptions about the predation process. We then introduce a new parameterization that unifies these mixing model error structures, as well as implicitly estimates the rate at which consumers sample from source populations (i.e., consumption rate). Using simulations and previously published mixing model datasets, we demonstrate that the new error parameterization outperforms existing models and provides an estimate of consumption. Our results suggest that the error structure introduced here will improve future mixing model estimates of animal diet. © 2016 by the Ecological Society of America.
Toward Robust Estimation of the Components of Forest Population Change

Treesearch

Francis A. Roesch

2014-01-01

Multiple levels of simulation are used to test the robustness of estimators of the components of change. I first created a variety of spatial-temporal populations based on, but more variable than, an actual forest monitoring data set and then sampled those populations under a variety of sampling error structures. The performance of each of four estimation approaches is...
Reference-free error estimation for multiple measurement methods.

PubMed

Madan, Hennadii; Pernuš, Franjo; Špiclin, Žiga

2018-01-01

We present a computational framework to select the most accurate and precise method of measurement of a certain quantity, when there is no access to the true value of the measurand. A typical use case is when several image analysis methods are applied to measure the value of a particular quantitative imaging biomarker from the same images. The accuracy of each measurement method is characterized by systematic error (bias), which is modeled as a polynomial in true values of measurand, and the precision as random error modeled with a Gaussian random variable. In contrast to previous works, the random errors are modeled jointly across all methods, thereby enabling the framework to analyze measurement methods based on similar principles, which may have correlated random errors. Furthermore, the posterior distribution of the error model parameters is estimated from samples obtained by Markov chain Monte-Carlo and analyzed to estimate the parameter values and the unknown true values of the measurand. The framework was validated on six synthetic and one clinical dataset containing measurements of total lesion load, a biomarker of neurodegenerative diseases, which was obtained with four automatic methods by analyzing brain magnetic resonance images. The estimates of bias and random error were in a good agreement with the corresponding least squares regression estimates against a reference.
Bayes Error Rate Estimation Using Classifier Ensembles

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep

2003-01-01

The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.
Estimating and comparing microbial diversity in the presence of sequencing errors

PubMed Central

Chiu, Chun-Huo

2016-01-01

Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872
Cluster-sample surveys and lot quality assurance sampling to evaluate yellow fever immunisation coverage following a national campaign, Bolivia, 2007.

PubMed

Pezzoli, Lorenzo; Pineda, Silvia; Halkyer, Percy; Crespo, Gladys; Andrews, Nick; Ronveaux, Olivier

2009-03-01

To estimate the yellow fever (YF) vaccine coverage for the endemic and non-endemic areas of Bolivia and to determine whether selected districts had acceptable levels of coverage (>70%). We conducted two surveys of 600 individuals (25 x 12 clusters) to estimate coverage in the endemic and non-endemic areas. We assessed 11 districts using lot quality assurance sampling (LQAS). The lot (district) sample was 35 individuals with six as decision value (alpha error 6% if true coverage 70%; beta error 6% if true coverage 90%). To increase feasibility, we divided the lots into five clusters of seven individuals; to investigate the effect of clustering, we calculated alpha and beta by conducting simulations where each cluster's true coverage was sampled from a normal distribution with a mean of 70% or 90% and standard deviations of 5% or 10%. Estimated coverage was 84.3% (95% CI: 78.9-89.7) in endemic areas, 86.8% (82.5-91.0) in non-endemic and 86.0% (82.8-89.1) nationally. LQAS showed that four lots had unacceptable coverage levels. In six lots, results were inconsistent with the estimated administrative coverage. The simulations suggested that the effect of clustering the lots is unlikely to have significantly increased the risk of making incorrect accept/reject decisions. Estimated YF coverage was high. Discrepancies between administrative coverage and LQAS results may be due to incorrect population data. Even allowing for clustering in LQAS, the statistical errors would remain low. Catch-up campaigns are recommended in districts with unacceptable coverage.
On the predictivity of pore-scale simulations: Estimating uncertainties with multilevel Monte Carlo

NASA Astrophysics Data System (ADS)

Icardi, Matteo; Boccardo, Gianluca; Tempone, Raúl

2016-09-01

A fast method with tunable accuracy is proposed to estimate errors and uncertainties in pore-scale and Digital Rock Physics (DRP) problems. The overall predictivity of these studies can be, in fact, hindered by many factors including sample heterogeneity, computational and imaging limitations, model inadequacy and not perfectly known physical parameters. The typical objective of pore-scale studies is the estimation of macroscopic effective parameters such as permeability, effective diffusivity and hydrodynamic dispersion. However, these are often non-deterministic quantities (i.e., results obtained for specific pore-scale sample and setup are not totally reproducible by another ;equivalent; sample and setup). The stochastic nature can arise due to the multi-scale heterogeneity, the computational and experimental limitations in considering large samples, and the complexity of the physical models. These approximations, in fact, introduce an error that, being dependent on a large number of complex factors, can be modeled as random. We propose a general simulation tool, based on multilevel Monte Carlo, that can reduce drastically the computational cost needed for computing accurate statistics of effective parameters and other quantities of interest, under any of these random errors. This is, to our knowledge, the first attempt to include Uncertainty Quantification (UQ) in pore-scale physics and simulation. The method can also provide estimates of the discretization error and it is tested on three-dimensional transport problems in heterogeneous materials, where the sampling procedure is done by generation algorithms able to reproduce realistic consolidated and unconsolidated random sphere and ellipsoid packings and arrangements. A totally automatic workflow is developed in an open-source code [1], that include rigid body physics and random packing algorithms, unstructured mesh discretization, finite volume solvers, extrapolation and post-processing techniques. The proposed method can be efficiently used in many porous media applications for problems such as stochastic homogenization/upscaling, propagation of uncertainty from microscopic fluid and rock properties to macro-scale parameters, robust estimation of Representative Elementary Volume size for arbitrary physics.
Mapping from disease-specific measures to health-state utility values in individuals with migraine.

PubMed

Gillard, Patrick J; Devine, Beth; Varon, Sepideh F; Liu, Lei; Sullivan, Sean D

2012-05-01

The objective of this study was to develop empirical algorithms that estimate health-state utility values from disease-specific quality-of-life scores in individuals with migraine. Data from a cross-sectional, multicountry study were used. Individuals with episodic and chronic migraine were randomly assigned to training or validation samples. Spearman's correlation coefficients between paired EuroQol five-dimensional (EQ-5D) questionnaire utility values and both Headache Impact Test (HIT-6) scores and Migraine-Specific Quality-of-Life Questionnaire version 2.1 (MSQ) domain scores (role restrictive, role preventive, and emotional function) were examined. Regression models were constructed to estimate EQ-5D questionnaire utility values from the HIT-6 score or the MSQ domain scores. Preferred algorithms were confirmed in the validation samples. In episodic migraine, the preferred HIT-6 and MSQ algorithms explained 22% and 25% of the variance (R(2)) in the training samples, respectively, and had similar prediction errors (root mean square errors of 0.30). In chronic migraine, the preferred HIT-6 and MSQ algorithms explained 36% and 45% of the variance in the training samples, respectively, and had similar prediction errors (root mean square errors 0.31 and 0.29). In episodic and chronic migraine, no statistically significant differences were observed between the mean observed and the mean estimated EQ-5D questionnaire utility values for the preferred HIT-6 and MSQ algorithms in the validation samples. The relationship between the EQ-5D questionnaire and the HIT-6 or the MSQ is adequate to use regression equations to estimate EQ-5D questionnaire utility values. The preferred HIT-6 and MSQ algorithms will be useful in estimating health-state utilities in migraine trials in which no preference-based measure is present. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Analytic score distributions for a spatially continuous tridirectional Monte Carol transport problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Booth, T.E.

1996-01-01

The interpretation of the statistical error estimates produced by Monte Carlo transport codes is still somewhat of an art. Empirically, there are variance reduction techniques whose error estimates are almost always reliable, and there are variance reduction techniques whose error estimates are often unreliable. Unreliable error estimates usually result from inadequate large-score sampling from the score distribution`s tail. Statisticians believe that more accurate confidence interval statements are possible if the general nature of the score distribution can be characterized. Here, the analytic score distribution for the exponential transform applied to a simple, spatially continuous Monte Carlo transport problem is provided.more » Anisotropic scattering and implicit capture are included in the theory. In large part, the analytic score distributions that are derived provide the basis for the ten new statistical quality checks in MCNP.« less
Accelerating Convergence in Molecular Dynamics Simulations of Solutes in Lipid Membranes by Conducting a Random Walk along the Bilayer Normal.

PubMed

Neale, Chris; Madill, Chris; Rauscher, Sarah; Pomès, Régis

2013-08-13

All molecular dynamics simulations are susceptible to sampling errors, which degrade the accuracy and precision of observed values. The statistical convergence of simulations containing atomistic lipid bilayers is limited by the slow relaxation of the lipid phase, which can exceed hundreds of nanoseconds. These long conformational autocorrelation times are exacerbated in the presence of charged solutes, which can induce significant distortions of the bilayer structure. Such long relaxation times represent hidden barriers that induce systematic sampling errors in simulations of solute insertion. To identify optimal methods for enhancing sampling efficiency, we quantitatively evaluate convergence rates using generalized ensemble sampling algorithms in calculations of the potential of mean force for the insertion of the ionic side chain analog of arginine in a lipid bilayer. Umbrella sampling (US) is used to restrain solute insertion depth along the bilayer normal, the order parameter commonly used in simulations of molecular solutes in lipid bilayers. When US simulations are modified to conduct random walks along the bilayer normal using a Hamiltonian exchange algorithm, systematic sampling errors are eliminated more rapidly and the rate of statistical convergence of the standard free energy of binding of the solute to the lipid bilayer is increased 3-fold. We compute the ratio of the replica flux transmitted across a defined region of the order parameter to the replica flux that entered that region in Hamiltonian exchange simulations. We show that this quantity, the transmission factor, identifies sampling barriers in degrees of freedom orthogonal to the order parameter. The transmission factor is used to estimate the depth-dependent conformational autocorrelation times of the simulation system, some of which exceed the simulation time, and thereby identify solute insertion depths that are prone to systematic sampling errors and estimate the lower bound of the amount of sampling that is required to resolve these sampling errors. Finally, we extend our simulations and verify that the conformational autocorrelation times estimated by the transmission factor accurately predict correlation times that exceed the simulation time scale-something that, to our knowledge, has never before been achieved.
A MIMO radar quadrature and multi-channel amplitude-phase error combined correction method based on cross-correlation

NASA Astrophysics Data System (ADS)

Yun, Lingtong; Zhao, Hongzhong; Du, Mengyuan

2018-04-01

Quadrature and multi-channel amplitude-phase error have to be compensated in the I/Q quadrature sampling and signal through multi-channel. A new method that it doesn't need filter and standard signal is presented in this paper. And it can combined estimate quadrature and multi-channel amplitude-phase error. The method uses cross-correlation and amplitude ratio between the signal to estimate the two amplitude-phase errors simply and effectively. And the advantages of this method are verified by computer simulation. Finally, the superiority of the method is also verified by measure data of outfield experiments.
Errors in the estimation of approximate entropy and other recurrence-plot-derived indices due to the finite resolution of RR time series.

PubMed

García-González, Miguel A; Fernández-Chimeno, Mireya; Ramos-Castro, Juan

2009-02-01

An analysis of the errors due to the finite resolution of RR time series in the estimation of the approximate entropy (ApEn) is described. The quantification errors in the discrete RR time series produce considerable errors in the ApEn estimation (bias and variance) when the signal variability or the sampling frequency is low. Similar errors can be found in indices related to the quantification of recurrence plots. An easy way to calculate a figure of merit [the signal to resolution of the neighborhood ratio (SRN)] is proposed in order to predict when the bias in the indices could be high. When SRN is close to an integer value n, the bias is higher than when near n - 1/2 or n + 1/2. Moreover, if SRN is close to an integer value, the lower this value, the greater the bias is.
Robust estimation of adaptive tensors of curvature by tensor voting.

PubMed

Tong, Wai-Shun; Tang, Chi-Keung

2005-03-01

Although curvature estimation from a given mesh or regularly sampled point set is a well-studied problem, it is still challenging when the input consists of a cloud of unstructured points corrupted by misalignment error and outlier noise. Such input is ubiquitous in computer vision. In this paper, we propose a three-pass tensor voting algorithm to robustly estimate curvature tensors, from which accurate principal curvatures and directions can be calculated. Our quantitative estimation is an improvement over the previous two-pass algorithm, where only qualitative curvature estimation (sign of Gaussian curvature) is performed. To overcome misalignment errors, our improved method automatically corrects input point locations at subvoxel precision, which also rejects outliers that are uncorrectable. To adapt to different scales locally, we define the RadiusHit of a curvature tensor to quantify estimation accuracy and applicability. Our curvature estimation algorithm has been proven with detailed quantitative experiments, performing better in a variety of standard error metrics (percentage error in curvature magnitudes, absolute angle difference in curvature direction) in the presence of a large amount of misalignment noise.
A semiempirical error estimation technique for PWV derived from atmospheric radiosonde data

NASA Astrophysics Data System (ADS)

Castro-Almazán, Julio A.; Pérez-Jordán, Gabriel; Muñoz-Tuñón, Casiana

2016-09-01

A semiempirical method for estimating the error and optimum number of sampled levels in precipitable water vapour (PWV) determinations from atmospheric radiosoundings is proposed. Two terms have been considered: the uncertainties in the measurements and the sampling error. Also, the uncertainty has been separated in the variance and covariance components. The sampling and covariance components have been modelled from an empirical dataset of 205 high-vertical-resolution radiosounding profiles, equipped with Vaisala RS80 and RS92 sondes at four different locations: Güímar (GUI) in Tenerife, at sea level, and the astronomical observatory at Roque de los Muchachos (ORM, 2300 m a.s.l.) on La Palma (both on the Canary Islands, Spain), Lindenberg (LIN) in continental Germany, and Ny-Ålesund (NYA) in the Svalbard Islands, within the Arctic Circle. The balloons at the ORM were launched during intensive and unique site-testing runs carried out in 1990 and 1995, while the data for the other sites were obtained from radiosounding stations operating for a period of 1 year (2013-2014). The PWV values ranged between ˜ 0.9 and ˜ 41 mm. The method sub-samples the profile for error minimization. The result is the minimum error and the optimum number of levels. The results obtained in the four sites studied showed that the ORM is the driest of the four locations and the one with the fastest vertical decay of PWV. The exponential autocorrelation pressure lags ranged from 175 hPa (ORM) to 500 hPa (LIN). The results show a coherent behaviour with no biases as a function of the profile. The final error is roughly proportional to PWV whereas the optimum number of levels (N0) is the reverse. The value of N0 is less than 400 for 77 % of the profiles and the absolute errors are always < 0.6 mm. The median relative error is 2.0 ± 0.7 % and the 90th percentile P90 = 4.6 %. Therefore, whereas a radiosounding samples at least N0 uniform vertical levels, depending on the water vapour content and distribution of the atmosphere, the error in the PWV estimate is likely to stay below ≈ 3 %, even for dry conditions.
Approaches to stream solute load estimation for solutes with varying dynamics from five diverse small watershed

USGS Publications Warehouse

Aulenbach, Brent T.; Burns, Douglas A.; Shanley, James B.; Yanai, Ruth D.; Bae, Kikang; Wild, Adam; Yang, Yang; Yi, Dong

2016-01-01

Estimating streamwater solute loads is a central objective of many water-quality monitoring and research studies, as loads are used to compare with atmospheric inputs, to infer biogeochemical processes, and to assess whether water quality is improving or degrading. In this study, we evaluate loads and associated errors to determine the best load estimation technique among three methods (a period-weighted approach, the regression-model method, and the composite method) based on a solute's concentration dynamics and sampling frequency. We evaluated a broad range of varying concentration dynamics with stream flow and season using four dissolved solutes (sulfate, silica, nitrate, and dissolved organic carbon) at five diverse small watersheds (Sleepers River Research Watershed, VT; Hubbard Brook Experimental Forest, NH; Biscuit Brook Watershed, NY; Panola Mountain Research Watershed, GA; and Río Mameyes Watershed, PR) with fairly high-frequency sampling during a 10- to 11-yr period. Data sets with three different sampling frequencies were derived from the full data set at each site (weekly plus storm/snowmelt events, weekly, and monthly) and errors in loads were assessed for the study period, annually, and monthly. For solutes that had a moderate to strong concentration–discharge relation, the composite method performed best, unless the autocorrelation of the model residuals was <0.2, in which case the regression-model method was most appropriate. For solutes that had a nonexistent or weak concentration–discharge relation (modelR2 < about 0.3), the period-weighted approach was most appropriate. The lowest errors in loads were achieved for solutes with the strongest concentration–discharge relations. Sample and regression model diagnostics could be used to approximate overall accuracies and annual precisions. For the period-weighed approach, errors were lower when the variance in concentrations was lower, the degree of autocorrelation in the concentrations was higher, and sampling frequency was higher. The period-weighted approach was most sensitive to sampling frequency. For the regression-model and composite methods, errors were lower when the variance in model residuals was lower. For the composite method, errors were lower when the autocorrelation in the residuals was higher. Guidelines to determine the best load estimation method based on solute concentration–discharge dynamics and diagnostics are presented, and should be applicable to other studies.
Effects of tree-to-tree variations on sap flux-based transpiration estimates in a forested watershed

NASA Astrophysics Data System (ADS)

Kume, Tomonori; Tsuruta, Kenji; Komatsu, Hikaru; Kumagai, Tomo'omi; Higashi, Naoko; Shinohara, Yoshinori; Otsuki, Kyoichi

2010-05-01

To estimate forest stand-scale water use, we assessed how sample sizes affect confidence of stand-scale transpiration (E) estimates calculated from sap flux (Fd) and sapwood area (AS_tree) measurements of individual trees. In a Japanese cypress plantation, we measured Fd and AS_tree in all trees (n = 58) within a 20 × 20 m study plot, which was divided into four 10 × 10 subplots. We calculated E from stand AS_tree (AS_stand) and mean stand Fd (JS) values. Using Monte Carlo analyses, we examined potential errors associated with sample sizes in E, AS_stand, and JS by using the original AS_tree and Fd data sets. Consequently, we defined optimal sample sizes of 10 and 15 for AS_stand and JS estimates, respectively, in the 20 × 20 m plot. Sample sizes greater than the optimal sample sizes did not decrease potential errors. The optimal sample sizes for JS changed according to plot size (e.g., 10 × 10 m and 10 × 20 m), while the optimal sample sizes for AS_stand did not. As well, the optimal sample sizes for JS did not change in different vapor pressure deficit conditions. In terms of E estimates, these results suggest that the tree-to-tree variations in Fd vary among different plots, and that plot size to capture tree-to-tree variations in Fd is an important factor. This study also discusses planning balanced sampling designs to extrapolate stand-scale estimates to catchment-scale estimates.
Sampling plantations to determine white-pine weevil injury

Treesearch

Robert L. Talerico; Robert W., Jr. Wilson

1973-01-01

Use of 1/10-acre square plots to obtain estimates of the proportion of never-weeviled trees necessary for evaluating and scheduling white-pine weevil control is described. The optimum number of trees to observe per plot is estimated from data obtained from sample plantations in the Northeast and a table is given. Of sample size required to achieve a standard error of...
Quantification of errors in ordinal outcome scales using shannon entropy: effect on sample size calculations.

PubMed

Mandava, Pitchaiah; Krumpelman, Chase S; Shah, Jharna N; White, Donna L; Kent, Thomas A

2013-01-01

Clinical trial outcomes often involve an ordinal scale of subjective functional assessments but the optimal way to quantify results is not clear. In stroke, the most commonly used scale, the modified Rankin Score (mRS), a range of scores ("Shift") is proposed as superior to dichotomization because of greater information transfer. The influence of known uncertainties in mRS assessment has not been quantified. We hypothesized that errors caused by uncertainties could be quantified by applying information theory. Using Shannon's model, we quantified errors of the "Shift" compared to dichotomized outcomes using published distributions of mRS uncertainties and applied this model to clinical trials. We identified 35 randomized stroke trials that met inclusion criteria. Each trial's mRS distribution was multiplied with the noise distribution from published mRS inter-rater variability to generate an error percentage for "shift" and dichotomized cut-points. For the SAINT I neuroprotectant trial, considered positive by "shift" mRS while the larger follow-up SAINT II trial was negative, we recalculated sample size required if classification uncertainty was taken into account. Considering the full mRS range, error rate was 26.1%±5.31 (Mean±SD). Error rates were lower for all dichotomizations tested using cut-points (e.g. mRS 1; 6.8%±2.89; overall p<0.001). Taking errors into account, SAINT I would have required 24% more subjects than were randomized. We show when uncertainty in assessments is considered, the lowest error rates are with dichotomization. While using the full range of mRS is conceptually appealing, a gain of information is counter-balanced by a decrease in reliability. The resultant errors need to be considered since sample size may otherwise be underestimated. In principle, we have outlined an approach to error estimation for any condition in which there are uncertainties in outcome assessment. We provide the user with programs to calculate and incorporate errors into sample size estimation.

A measurement error model for physical activity level as measured by a questionnaire with application to the 1999-2006 NHANES questionnaire.

PubMed

Tooze, Janet A; Troiano, Richard P; Carroll, Raymond J; Moshfegh, Alanna J; Freedman, Laurence S

2013-06-01

Systematic investigations into the structure of measurement error of physical activity questionnaires are lacking. We propose a measurement error model for a physical activity questionnaire that uses physical activity level (the ratio of total energy expenditure to basal energy expenditure) to relate questionnaire-based reports of physical activity level to true physical activity levels. The 1999-2006 National Health and Nutrition Examination Survey physical activity questionnaire was administered to 433 participants aged 40-69 years in the Observing Protein and Energy Nutrition (OPEN) Study (Maryland, 1999-2000). Valid estimates of participants' total energy expenditure were also available from doubly labeled water, and basal energy expenditure was estimated from an equation; the ratio of those measures estimated true physical activity level ("truth"). We present a measurement error model that accommodates the mixture of errors that arise from assuming a classical measurement error model for doubly labeled water and a Berkson error model for the equation used to estimate basal energy expenditure. The method was then applied to the OPEN Study. Correlations between the questionnaire-based physical activity level and truth were modest (r = 0.32-0.41); attenuation factors (0.43-0.73) indicate that the use of questionnaire-based physical activity level would lead to attenuated estimates of effect size. Results suggest that sample sizes for estimating relationships between physical activity level and disease should be inflated, and that regression calibration can be used to provide measurement error-adjusted estimates of relationships between physical activity and disease.
Combining wrist age and third molars in forensic age estimation: how to calculate the joint age estimate and its error rate in age diagnostics.

PubMed

Gelbrich, Bianca; Frerking, Carolin; Weiss, Sandra; Schwerdt, Sebastian; Stellzig-Eisenhauer, Angelika; Tausche, Eve; Gelbrich, Götz

2015-01-01

Forensic age estimation in living adolescents is based on several methods, e.g. the assessment of skeletal and dental maturation. Combination of several methods is mandatory, since age estimates from a single method are too imprecise due to biological variability. The correlation of the errors of the methods being combined must be known to calculate the precision of combined age estimates. To examine the correlation of the errors of the hand and the third molar method and to demonstrate how to calculate the combined age estimate. Clinical routine radiographs of the hand and dental panoramic images of 383 patients (aged 7.8-19.1 years, 56% female) were assessed. Lack of correlation (r = -0.024, 95% CI = -0.124 to + 0.076, p = 0.64) allows calculating the combined age estimate as the weighted average of the estimates from hand bones and third molars. Combination improved the standard deviations of errors (hand = 0.97, teeth = 1.35 years) to 0.79 years. Uncorrelated errors of the age estimates obtained from both methods allow straightforward determination of the common estimate and its variance. This is also possible when reference data for the hand and the third molar method are established independently from each other, using different samples.
Generalized site occupancy models allowing for false positive and false negative errors

USGS Publications Warehouse

Royle, J. Andrew; Link, W.A.

2006-01-01

Site occupancy models have been developed that allow for imperfect species detection or ?false negative? observations. Such models have become widely adopted in surveys of many taxa. The most fundamental assumption underlying these models is that ?false positive? errors are not possible. That is, one cannot detect a species where it does not occur. However, such errors are possible in many sampling situations for a number of reasons, and even low false positive error rates can induce extreme bias in estimates of site occupancy when they are not accounted for. In this paper, we develop a model for site occupancy that allows for both false negative and false positive error rates. This model can be represented as a two-component finite mixture model and can be easily fitted using freely available software. We provide an analysis of avian survey data using the proposed model and present results of a brief simulation study evaluating the performance of the maximum-likelihood estimator and the naive estimator in the presence of false positive errors.
Discrepancy-based error estimates for Quasi-Monte Carlo III. Error distributions and central limits

NASA Astrophysics Data System (ADS)

Hoogland, Jiri; Kleiss, Ronald

1997-04-01

In Quasi-Monte Carlo integration, the integration error is believed to be generally smaller than in classical Monte Carlo with the same number of integration points. Using an appropriate definition of an ensemble of quasi-random point sets, we derive various results on the probability distribution of the integration error, which can be compared to the standard Central Limit Theorem for normal stochastic sampling. In many cases, a Gaussian error distribution is obtained.
Fish fins as non-lethal surrogates for muscle tissues in freshwater food web studies using stable isotopes.

PubMed

Hette Tronquart, Nicolas; Mazeas, Laurent; Reuilly-Manenti, Liana; Zahm, Amandine; Belliard, Jérôme

2012-07-30

Dorsal white muscle is the standard tissue analysed in fish trophic studies using stable isotope analyses. However, sampling white muscle often implies the sacrifice of fish. Thus, we examined whether the non-lethal sampling of fin tissue can substitute muscle sampling in food web studies. Analysing muscle and fin δ(15)N and δ(13)C values of 466 European freshwater fish (14 species) with an elemental analyser coupled with an isotope ratio mass spectrometer, we compared the isotope values of the two tissues. Correlations between fin and muscle isotope ratios were examined for all fish together and specifically for 12 species. We further proposed four methods of assessing muscle from fin isotope ratios and estimated the errors made using these muscle surrogates. Despite significant differences between isotope values of the two tissues, fin and muscle isotopic signals are strongly correlated. Muscle values, estimated with raw fin isotope ratios (1st method), induce an error of ca. 1‰ for both isotopes. In comparison, specific (2nd method) or general (3rd method) correlations provide meaningful corrections of fin isotope ratios (errors <0.6‰). On the other hand, relationships, established for Australian tropical fish, only give poor muscle estimates (errors >0.8‰). There is little chance that a global model can be created. However, the 2nd and 3rd methods of estimating muscle values from fin isotope ratios should provide an acceptable level of error for the studies of European freshwater food web. We thus recommend that future studies use fin tissue as a non-lethal surrogate for muscle. Copyright © 2012 John Wiley & Sons, Ltd.
Nonparametric probability density estimation by optimization theoretic techniques

NASA Technical Reports Server (NTRS)

Scott, D. W.

1976-01-01

Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.
Micro-organism distribution sampling for bioassays

NASA Technical Reports Server (NTRS)

Nelson, B. A.

1975-01-01

Purpose of sampling distribution is to characterize sample-to-sample variation so statistical tests may be applied, to estimate error due to sampling (confidence limits) and to evaluate observed differences between samples. Distribution could be used for bioassays taken in hospitals, breweries, food-processing plants, and pharmaceutical plants.
Assessment of ecologic regression in the study of lung cancer and indoor radon.

PubMed

Stidley, C A; Samet, J M

1994-02-01

Ecologic regression studies conducted to assess the cancer risk of indoor radon to the general population are subject to methodological limitations, and they have given seemingly contradictory results. The authors use simulations to examine the effects of two major methodological problems that affect these studies: measurement error and misspecification of the risk model. In a simulation study of the effect of measurement error caused by the sampling process used to estimate radon exposure for a geographic unit, both the effect of radon and the standard error of the effect estimate were underestimated, with greater bias for smaller sample sizes. In another simulation study, which addressed the consequences of uncontrolled confounding by cigarette smoking, even small negative correlations between county geometric mean annual radon exposure and the proportion of smokers resulted in negative average estimates of the radon effect. A third study considered consequences of using simple linear ecologic models when the true underlying model relation between lung cancer and radon exposure is nonlinear. These examples quantify potential biases and demonstrate the limitations of estimating risks from ecologic studies of lung cancer and indoor radon.
Three-dimensional FLASH Laser Radar Range Estimation via Blind Deconvolution

DTIC Science & Technology

2009-10-01

scene can result in errors due to several factors including the optical spatial impulse response, detector blurring, photon noise , timing jitter, and...estimation error include spatial blur, detector blurring, noise , timing jitter, and inter-sample targets. Unlike previous research, this paper ac- counts...for pixel coupling by defining the range image mathematical model as a 2D convolution between the system spatial impulse response and the object (target
Quantifying uncertainty in geoacoustic inversion. II. Application to broadband, shallow-water data.

PubMed

Dosso, Stan E; Nielsen, Peter L

2002-01-01

This paper applies the new method of fast Gibbs sampling (FGS) to estimate the uncertainties of seabed geoacoustic parameters in a broadband, shallow-water acoustic survey, with the goal of interpreting the survey results and validating the method for experimental data. FGS applies a Bayesian approach to geoacoustic inversion based on sampling the posterior probability density to estimate marginal probability distributions and parameter covariances. This requires knowledge of the statistical distribution of the data errors, including both measurement and theory errors, which is generally not available. Invoking the simplifying assumption of independent, identically distributed Gaussian errors allows a maximum-likelihood estimate of the data variance and leads to a practical inversion algorithm. However, it is necessary to validate these assumptions, i.e., to verify that the parameter uncertainties obtained represent meaningful estimates. To this end, FGS is applied to a geoacoustic experiment carried out at a site off the west coast of Italy where previous acoustic and geophysical studies have been performed. The parameter uncertainties estimated via FGS are validated by comparison with: (i) the variability in the results of inverting multiple independent data sets collected during the experiment; (ii) the results of FGS inversion of synthetic test cases designed to simulate the experiment and data errors; and (iii) the available geophysical ground truth. Comparisons are carried out for a number of different source bandwidths, ranges, and levels of prior information, and indicate that FGS provides reliable and stable uncertainty estimates for the geoacoustic inverse problem.
Selecting SNPs informative for African, American Indian and European Ancestry: application to the Family Investigation of Nephropathy and Diabetes (FIND).

PubMed

Williams, Robert C; Elston, Robert C; Kumar, Pankaj; Knowler, William C; Abboud, Hanna E; Adler, Sharon; Bowden, Donald W; Divers, Jasmin; Freedman, Barry I; Igo, Robert P; Ipp, Eli; Iyengar, Sudha K; Kimmel, Paul L; Klag, Michael J; Kohn, Orly; Langefeld, Carl D; Leehey, David J; Nelson, Robert G; Nicholas, Susanne B; Pahl, Madeleine V; Parekh, Rulan S; Rotter, Jerome I; Schelling, Jeffrey R; Sedor, John R; Shah, Vallabh O; Smith, Michael W; Taylor, Kent D; Thameem, Farook; Thornley-Brown, Denyse; Winkler, Cheryl A; Guo, Xiuqing; Zager, Phillip; Hanson, Robert L

2016-05-04

The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample. A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy. The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.
Latin hypercube approach to estimate uncertainty in ground water vulnerability

USGS Publications Warehouse

Gurdak, J.J.; McCray, J.E.; Thyne, G.; Qi, S.L.

2007-01-01

A methodology is proposed to quantify prediction uncertainty associated with ground water vulnerability models that were developed through an approach that coupled multivariate logistic regression with a geographic information system (GIS). This method uses Latin hypercube sampling (LHS) to illustrate the propagation of input error and estimate uncertainty associated with the logistic regression predictions of ground water vulnerability. Central to the proposed method is the assumption that prediction uncertainty in ground water vulnerability models is a function of input error propagation from uncertainty in the estimated logistic regression model coefficients (model error) and the values of explanatory variables represented in the GIS (data error). Input probability distributions that represent both model and data error sources of uncertainty were simultaneously sampled using a Latin hypercube approach with logistic regression calculations of probability of elevated nonpoint source contaminants in ground water. The resulting probability distribution represents the prediction intervals and associated uncertainty of the ground water vulnerability predictions. The method is illustrated through a ground water vulnerability assessment of the High Plains regional aquifer. Results of the LHS simulations reveal significant prediction uncertainties that vary spatially across the regional aquifer. Additionally, the proposed method enables a spatial deconstruction of the prediction uncertainty that can lead to improved prediction of ground water vulnerability. ?? 2007 National Ground Water Association.
A General Linear Method for Equating with Small Samples

ERIC Educational Resources Information Center

Albano, Anthony D.

2015-01-01

Research on equating with small samples has shown that methods with stronger assumptions and fewer statistical estimates can lead to decreased error in the estimated equating function. This article introduces a new approach to linear observed-score equating, one which provides flexible control over how form difficulty is assumed versus estimated…
Blinded versus unblinded estimation of a correlation coefficient to inform interim design adaptations.

PubMed

Kunz, Cornelia U; Stallard, Nigel; Parsons, Nicholas; Todd, Susan; Friede, Tim

2017-03-01

Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under- or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re-assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one-sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups. © 2016 The Authors. Biometrical Journal Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Blinded versus unblinded estimation of a correlation coefficient to inform interim design adaptations

PubMed Central

Stallard, Nigel; Parsons, Nicholas; Todd, Susan; Friede, Tim

2016-01-01

Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under‐ or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re‐assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one‐sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups. PMID:27886393
Comparison of point counts and territory mapping for detecting effects of forest management on songbirds

USGS Publications Warehouse

Newell, Felicity L.; Sheehan, James; Wood, Petra Bohall; Rodewald, Amanda D.; Buehler, David A.; Keyser, Patrick D.; Larkin, Jeffrey L.; Beachy, Tiffany A.; Bakermans, Marja H.; Boves, Than J.; Evans, Andrea; George, Gregory A.; McDermott, Molly E.; Perkins, Kelly A.; White, Matthew; Wigley, T. Bently

2013-01-01

Point counts are commonly used to assess changes in bird abundance, including analytical approaches such as distance sampling that estimate density. Point-count methods have come under increasing scrutiny because effects of detection probability and field error are difficult to quantify. For seven forest songbirds, we compared fixed-radii counts (50 m and 100 m) and density estimates obtained from distance sampling to known numbers of birds determined by territory mapping. We applied point-count analytic approaches to a typical forest management question and compared results to those obtained by territory mapping. We used a before–after control impact (BACI) analysis with a data set collected across seven study areas in the central Appalachians from 2006 to 2010. Using a 50-m fixed radius, variance in error was at least 1.5 times that of the other methods, whereas a 100-m fixed radius underestimated actual density by >3 territories per 10 ha for the most abundant species. Distance sampling improved accuracy and precision compared to fixed-radius counts, although estimates were affected by birds counted outside 10-ha units. In the BACI analysis, territory mapping detected an overall treatment effect for five of the seven species, and effects were generally consistent each year. In contrast, all point-count methods failed to detect two treatment effects due to variance and error in annual estimates. Overall, our results highlight the need for adequate sample sizes to reduce variance, and skilled observers to reduce the level of error in point-count data. Ultimately, the advantages and disadvantages of different survey methods should be considered in the context of overall study design and objectives, allowing for trade-offs among effort, accuracy, and power to detect treatment effects.
Estimation of time averages from irregularly spaced observations - With application to coastal zone color scanner estimates of chlorophyll concentration

NASA Technical Reports Server (NTRS)

Chelton, Dudley B.; Schlax, Michael G.

1991-01-01

The sampling error of an arbitrary linear estimate of a time-averaged quantity constructed from a time series of irregularly spaced observations at a fixed located is quantified through a formalism. The method is applied to satellite observations of chlorophyll from the coastal zone color scanner. The two specific linear estimates under consideration are the composite average formed from the simple average of all observations within the averaging period and the optimal estimate formed by minimizing the mean squared error of the temporal average based on all the observations in the time series. The resulting suboptimal estimates are shown to be more accurate than composite averages. Suboptimal estimates are also found to be nearly as accurate as optimal estimates using the correct signal and measurement error variances and correlation functions for realistic ranges of these parameters, which makes it a viable practical alternative to the composite average method generally employed at present.
External quality-assurance results for the National Atmospheric Deposition Program/National Trends Network during 1991

USGS Publications Warehouse

Nilles, M.A.; Gordon, J.D.; Schroder, L.J.; Paulin, C.E.

1995-01-01

The U.S. Geological Survey used four programs in 1991 to provide external quality assurance for the National Atmospheric Deposition Program/National Trends Network (NADP/NTN). An intersite-comparison program was used to evaluate onsite pH and specific-conductance determinations. The effects of routine sample handling, processing, and shipping of wet-deposition samples on analyte determinations and an estimated precision of analyte values and concentrations were evaluated in the blind-audit program. Differences between analytical results and an estimate of the analytical precision of four laboratories routinely measuring wet deposition were determined by an interlaboratory-comparison program. Overall precision estimates for the precipitation-monitoring system were determined for selected sites by a collocated-sampler program. Results of the intersite-comparison program indicated that 93 and 86 percent of the site operators met the NADP/NTN accuracy goal for pH determinations during the two intersite-comparison studies completed during 1991. The results also indicated that 96 and 97 percent of the site operators met the NADP/NTN accuracy goal for specific-conductance determinations during the two 1991 studies. The effects of routine sample handling, processing, and shipping, determined in the blind-audit program indicated significant positive bias (a=.O 1) for calcium, magnesium, sodium, potassium, chloride, nitrate, and sulfate. Significant negative bias (or=.01) was determined for hydrogen ion and specific conductance. Only ammonium determinations were not biased. A Kruskal-Wallis test indicated that there were no significant (*3t=.01) differences in analytical results from the four laboratories participating in the interlaboratory-comparison program. Results from the collocated-sampler program indicated the median relative error for cation concentration and deposition exceeded eight percent at most sites, whereas the median relative error for sample volume, sulfate, and nitrate concentration at all sites was less than four percent. The median relative error for hydrogen ion concentration and deposition ranged from 4.6 to 18.3 percent at the four sites and as indicated in previous years of the study, was inversely proportional to the acidity of the precipitation at a given site. Overall, collocated-sampling error typically was five times that of laboratory error estimates for most analytes.
Improving the accuracy of livestock distribution estimates through spatial interpolation.

PubMed

Bryssinckx, Ward; Ducheyne, Els; Muhwezi, Bernard; Godfrey, Sunday; Mintiens, Koen; Leirs, Herwig; Hendrickx, Guy

2012-11-01

Animal distribution maps serve many purposes such as estimating transmission risk of zoonotic pathogens to both animals and humans. The reliability and usability of such maps is highly dependent on the quality of the input data. However, decisions on how to perform livestock surveys are often based on previous work without considering possible consequences. A better understanding of the impact of using different sample designs and processing steps on the accuracy of livestock distribution estimates was acquired through iterative experiments using detailed survey. The importance of sample size, sample design and aggregation is demonstrated and spatial interpolation is presented as a potential way to improve cattle number estimates. As expected, results show that an increasing sample size increased the precision of cattle number estimates but these improvements were mainly seen when the initial sample size was relatively low (e.g. a median relative error decrease of 0.04% per sampled parish for sample sizes below 500 parishes). For higher sample sizes, the added value of further increasing the number of samples declined rapidly (e.g. a median relative error decrease of 0.01% per sampled parish for sample sizes above 500 parishes. When a two-stage stratified sample design was applied to yield more evenly distributed samples, accuracy levels were higher for low sample densities and stabilised at lower sample sizes compared to one-stage stratified sampling. Aggregating the resulting cattle number estimates yielded significantly more accurate results because of averaging under- and over-estimates (e.g. when aggregating cattle number estimates from subcounty to district level, P <0.009 based on a sample of 2,077 parishes using one-stage stratified samples). During aggregation, area-weighted mean values were assigned to higher administrative unit levels. However, when this step is preceded by a spatial interpolation to fill in missing values in non-sampled areas, accuracy is improved remarkably. This counts especially for low sample sizes and spatially even distributed samples (e.g. P <0.001 for a sample of 170 parishes using one-stage stratified sampling and aggregation on district level). Whether the same observations apply on a lower spatial scale should be further investigated.
An algebraic aspect of Pareto mixture parameter estimation using censored sample: A Bayesian approach.

PubMed

Saleem, Muhammad; Sharif, Kashif; Fahmi, Aliya

2018-04-27

Applications of Pareto distribution are common in reliability, survival and financial studies. In this paper, A Pareto mixture distribution is considered to model a heterogeneous population comprising of two subgroups. Each of two subgroups is characterized by the same functional form with unknown distinct shape and scale parameters. Bayes estimators have been derived using flat and conjugate priors using squared error loss function. Standard errors have also been derived for the Bayes estimators. An interesting feature of this study is the preparation of components of Fisher Information matrix.

Measures of Muscular Strength in U.S. Children and Adolescents, 2012

MedlinePlus

... errors of the percentages were estimated using Taylor series linearization, a method that incorporates the sample weights and sample design. Differences between groups were evaluated using a t ...
Total elimination of sampling errors in polarization imagery obtained with integrated microgrid polarimeters.

PubMed

Tyo, J Scott; LaCasse, Charles F; Ratliff, Bradley M

2009-10-15

Microgrid polarimeters operate by integrating a focal plane array with an array of micropolarizers. The Stokes parameters are estimated by comparing polarization measurements from pixels in a neighborhood around the point of interest. The main drawback is that the measurements used to estimate the Stokes vector are made at different locations, leading to a false polarization signature owing to instantaneous field-of-view (IFOV) errors. We demonstrate for the first time, to our knowledge, that spatially band limited polarization images can be ideally reconstructed with no IFOV error by using a linear system framework.
Variance of discharge estimates sampled using acoustic Doppler current profilers from moving boats

USGS Publications Warehouse

Garcia, Carlos M.; Tarrab, Leticia; Oberg, Kevin; Szupiany, Ricardo; Cantero, Mariano I.

2012-01-01

This paper presents a model for quantifying the random errors (i.e., variance) of acoustic Doppler current profiler (ADCP) discharge measurements from moving boats for different sampling times. The model focuses on the random processes in the sampled flow field and has been developed using statistical methods currently available for uncertainty analysis of velocity time series. Analysis of field data collected using ADCP from moving boats from three natural rivers of varying sizes and flow conditions shows that, even though the estimate of the integral time scale of the actual turbulent flow field is larger than the sampling interval, the integral time scale of the sampled flow field is on the order of the sampling interval. Thus, an equation for computing the variance error in discharge measurements associated with different sampling times, assuming uncorrelated flow fields is appropriate. The approach is used to help define optimal sampling strategies by choosing the exposure time required for ADCPs to accurately measure flow discharge.
Assessment of pollutant mean concentrations in the Yangtze estuary based on MSN theory.

PubMed

Ren, Jing; Gao, Bing-Bo; Fan, Hai-Mei; Zhang, Zhi-Hong; Zhang, Yao; Wang, Jin-Feng

2016-12-15

Reliable assessment of water quality is a critical issue for estuaries. Nutrient concentrations show significant spatial distinctions between areas under the influence of fresh-sea water interaction and anthropogenic effects. For this situation, given the limitations of general mean estimation approaches, a new method for surfaces with non-homogeneity (MSN) was applied to obtain optimized linear unbiased estimations of the mean nutrient concentrations in the study area in the Yangtze estuary from 2011 to 2013. Other mean estimation methods, including block Kriging (BK), simple random sampling (SS) and stratified sampling (ST) inference, were applied simultaneously for comparison. Their performance was evaluated by estimation error. The results show that MSN had the highest accuracy, while SS had the highest estimation error. ST and BK were intermediate in terms of their performance. Thus, MSN is an appropriate method that can be adopted to reduce the uncertainty of mean pollutant estimation in estuaries. Copyright © 2016 Elsevier Ltd. All rights reserved.
The theory precision analyse of RFM localization of satellite remote sensing imagery

NASA Astrophysics Data System (ADS)

Zhang, Jianqing; Xv, Biao

2009-11-01

The tradition method of detecting precision of Rational Function Model(RFM) is to make use of a great deal check points, and it calculates mean square error through comparing calculational coordinate with known coordinate. This method is from theory of probability, through a large number of samples to statistic estimate value of mean square error, we can think its estimate value approaches in its true when samples are well enough. This paper is from angle of survey adjustment, take law of propagation of error as the theory basis, and it calculates theory precision of RFM localization. Then take the SPOT5 three array imagery as experiment data, and the result of traditional method and narrated method in the paper are compared, while has confirmed tradition method feasible, and answered its theory precision question from the angle of survey adjustment.
A path integral methodology for obtaining thermodynamic properties of nonadiabatic systems using Gaussian mixture distributions

NASA Astrophysics Data System (ADS)

Raymond, Neil; Iouchtchenko, Dmitri; Roy, Pierre-Nicholas; Nooijen, Marcel

2018-05-01

We introduce a new path integral Monte Carlo method for investigating nonadiabatic systems in thermal equilibrium and demonstrate an approach to reducing stochastic error. We derive a general path integral expression for the partition function in a product basis of continuous nuclear and discrete electronic degrees of freedom without the use of any mapping schemes. We separate our Hamiltonian into a harmonic portion and a coupling portion; the partition function can then be calculated as the product of a Monte Carlo estimator (of the coupling contribution to the partition function) and a normalization factor (that is evaluated analytically). A Gaussian mixture model is used to evaluate the Monte Carlo estimator in a computationally efficient manner. Using two model systems, we demonstrate our approach to reduce the stochastic error associated with the Monte Carlo estimator. We show that the selection of the harmonic oscillators comprising the sampling distribution directly affects the efficiency of the method. Our results demonstrate that our path integral Monte Carlo method's deviation from exact Trotter calculations is dominated by the choice of the sampling distribution. By improving the sampling distribution, we can drastically reduce the stochastic error leading to lower computational cost.
Quantifying and correcting motion artifacts in MRI

NASA Astrophysics Data System (ADS)

Bones, Philip J.; Maclaren, Julian R.; Millane, Rick P.; Watts, Richard

2006-08-01

Patient motion during magnetic resonance imaging (MRI) can produce significant artifacts in a reconstructed image. Since measurements are made in the spatial frequency domain ('k-space'), rigid-body translational motion results in phase errors in the data samples while rotation causes location errors. A method is presented to detect and correct these errors via a modified sampling strategy, thereby achieving more accurate image reconstruction. The strategy involves sampling vertical and horizontal strips alternately in k-space and employs phase correlation within the overlapping segments to estimate translational motion. An extension, also based on correlation, is employed to estimate rotational motion. Results from simulations with computer-generated phantoms suggest that the algorithm is robust up to realistic noise levels. The work is being extended to physical phantoms. Provided that a reference image is available and the object is of limited extent, it is shown that a measure related to the amount of energy outside the support can be used to objectively compare the severity of motion-induced artifacts.
Previous Estimates of Mitochondrial DNA Mutation Level Variance Did Not Account for Sampling Error: Comparing the mtDNA Genetic Bottleneck in Mice and Humans

PubMed Central

Wonnapinij, Passorn; Chinnery, Patrick F.; Samuels, David C.

2010-01-01

In cases of inherited pathogenic mitochondrial DNA (mtDNA) mutations, a mother and her offspring generally have large and seemingly random differences in the amount of mutated mtDNA that they carry. Comparisons of measured mtDNA mutation level variance values have become an important issue in determining the mechanisms that cause these large random shifts in mutation level. These variance measurements have been made with samples of quite modest size, which should be a source of concern because higher-order statistics, such as variance, are poorly estimated from small sample sizes. We have developed an analysis of the standard error of variance from a sample of size n, and we have defined error bars for variance measurements based on this standard error. We calculate variance error bars for several published sets of measurements of mtDNA mutation level variance and show how the addition of the error bars alters the interpretation of these experimental results. We compare variance measurements from human clinical data and from mouse models and show that the mutation level variance is clearly higher in the human data than it is in the mouse models at both the primary oocyte and offspring stages of inheritance. We discuss how the standard error of variance can be used in the design of experiments measuring mtDNA mutation level variance. Our results show that variance measurements based on fewer than 20 measurements are generally unreliable and ideally more than 50 measurements are required to reliably compare variances with less than a 2-fold difference. PMID:20362273
Analysis of methods commonly used in biomedicine for treatment versus control comparison of very small samples.

PubMed

Ristić-Djurović, Jasna L; Ćirković, Saša; Mladenović, Pavle; Romčević, Nebojša; Trbovich, Alexander M

2018-04-01

A rough estimate indicated that use of samples of size not larger than ten is not uncommon in biomedical research and that many of such studies are limited to strong effects due to sample sizes smaller than six. For data collected from biomedical experiments it is also often unknown if mathematical requirements incorporated in the sample comparison methods are satisfied. Computer simulated experiments were used to examine performance of methods for qualitative sample comparison and its dependence on the effectiveness of exposure, effect intensity, distribution of studied parameter values in the population, and sample size. The Type I and Type II errors, their average, as well as the maximal errors were considered. The sample size 9 and the t-test method with p = 5% ensured error smaller than 5% even for weak effects. For sample sizes 6-8 the same method enabled detection of weak effects with errors smaller than 20%. If the sample sizes were 3-5, weak effects could not be detected with an acceptable error; however, the smallest maximal error in the most general case that includes weak effects is granted by the standard error of the mean method. The increase of sample size from 5 to 9 led to seven times more accurate detection of weak effects. Strong effects were detected regardless of the sample size and method used. The minimal recommended sample size for biomedical experiments is 9. Use of smaller sizes and the method of their comparison should be justified by the objective of the experiment. Copyright © 2018 Elsevier B.V. All rights reserved.
Sampling error in timber surveys

Treesearch

Austin Hasel

1938-01-01

Various sampling strategies are evaluated for efficiency in an interior ponderosa pine forest. In a 5760 acre tract, efficiency was gained by stratifying into quarter acre blocks and sampling randomly from within. A systematic cruise was found to be superior for volume estimation.
Precipitation and Latent Heating Distributions from Satellite Passive Microwave Radiometry. Part 1; Method and Uncertainties

NASA Technical Reports Server (NTRS)

Olson, William S.; Kummerow, Christian D.; Yang, Song; Petty, Grant W.; Tao, Wei-Kuo; Bell, Thomas L.; Braun, Scott A.; Wang, Yansen; Lang, Stephen E.; Johnson, Daniel E.

2004-01-01

A revised Bayesian algorithm for estimating surface rain rate, convective rain proportion, and latent heating/drying profiles from satellite-borne passive microwave radiometer observations over ocean backgrounds is described. The algorithm searches a large database of cloud-radiative model simulations to find cloud profiles that are radiatively consistent with a given set of microwave radiance measurements. The properties of these radiatively consistent profiles are then composited to obtain best estimates of the observed properties. The revised algorithm is supported by an expanded and more physically consistent database of cloud-radiative model simulations. The algorithm also features a better quantification of the convective and non-convective contributions to total rainfall, a new geographic database, and an improved representation of background radiances in rain-free regions. Bias and random error estimates are derived from applications of the algorithm to synthetic radiance data, based upon a subset of cloud resolving model simulations, and from the Bayesian formulation itself. Synthetic rain rate and latent heating estimates exhibit a trend of high (low) bias for low (high) retrieved values. The Bayesian estimates of random error are propagated to represent errors at coarser time and space resolutions, based upon applications of the algorithm to TRMM Microwave Imager (TMI) data. Errors in instantaneous rain rate estimates at 0.5 deg resolution range from approximately 50% at 1 mm/h to 20% at 14 mm/h. These errors represent about 70-90% of the mean random deviation between collocated passive microwave and spaceborne radar rain rate estimates. The cumulative algorithm error in TMI estimates at monthly, 2.5 deg resolution is relatively small (less than 6% at 5 mm/day) compared to the random error due to infrequent satellite temporal sampling (8-35% at the same rain rate).
An evaluation of inferential procedures for adaptive clinical trial designs with pre-specified rules for modifying the sample size.

PubMed

Levin, Gregory P; Emerson, Sarah C; Emerson, Scott S

2014-09-01

Many papers have introduced adaptive clinical trial methods that allow modifications to the sample size based on interim estimates of treatment effect. There has been extensive commentary on type I error control and efficiency considerations, but little research on estimation after an adaptive hypothesis test. We evaluate the reliability and precision of different inferential procedures in the presence of an adaptive design with pre-specified rules for modifying the sampling plan. We extend group sequential orderings of the outcome space based on the stage at stopping, likelihood ratio statistic, and sample mean to the adaptive setting in order to compute median-unbiased point estimates, exact confidence intervals, and P-values uniformly distributed under the null hypothesis. The likelihood ratio ordering is found to average shorter confidence intervals and produce higher probabilities of P-values below important thresholds than alternative approaches. The bias adjusted mean demonstrates the lowest mean squared error among candidate point estimates. A conditional error-based approach in the literature has the benefit of being the only method that accommodates unplanned adaptations. We compare the performance of this and other methods in order to quantify the cost of failing to plan ahead in settings where adaptations could realistically be pre-specified at the design stage. We find the cost to be meaningful for all designs and treatment effects considered, and to be substantial for designs frequently proposed in the literature. © 2014, The International Biometric Society.
A feasibility study in adapting Shamos Bickel and Hodges Lehman estimator into T-Method for normalization

NASA Astrophysics Data System (ADS)

Harudin, N.; Jamaludin, K. R.; Muhtazaruddin, M. Nabil; Ramlie, F.; Muhamad, Wan Zuki Azman Wan

2018-03-01

T-Method is one of the techniques governed under Mahalanobis Taguchi System that developed specifically for multivariate data predictions. Prediction using T-Method is always possible even with very limited sample size. The user of T-Method required to clearly understanding the population data trend since this method is not considering the effect of outliers within it. Outliers may cause apparent non-normality and the entire classical methods breakdown. There exist robust parameter estimate that provide satisfactory results when the data contain outliers, as well as when the data are free of them. The robust parameter estimates of location and scale measure called Shamos Bickel (SB) and Hodges Lehman (HL) which are used as a comparable method to calculate the mean and standard deviation of classical statistic is part of it. Embedding these into T-Method normalize stage feasibly help in enhancing the accuracy of the T-Method as well as analysing the robustness of T-method itself. However, the result of higher sample size case study shows that T-method is having lowest average error percentages (3.09%) on data with extreme outliers. HL and SB is having lowest error percentages (4.67%) for data without extreme outliers with minimum error differences compared to T-Method. The error percentages prediction trend is vice versa for lower sample size case study. The result shows that with minimum sample size, which outliers always be at low risk, T-Method is much better on that, while higher sample size with extreme outliers, T-Method as well show better prediction compared to others. For the case studies conducted in this research, it shows that normalization of T-Method is showing satisfactory results and it is not feasible to adapt HL and SB or normal mean and standard deviation into it since it’s only provide minimum effect of percentages errors. Normalization using T-method is still considered having lower risk towards outlier’s effect.
Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data

ERIC Educational Resources Information Center

Savalei, Victoria

2010-01-01

Maximum likelihood is the most common estimation method in structural equation modeling. Standard errors for maximum likelihood estimates are obtained from the associated information matrix, which can be estimated from the sample using either expected or observed information. It is known that, with complete data, estimates based on observed or…
Large Sample Confidence Intervals for Item Response Theory Reliability Coefficients

ERIC Educational Resources Information Center

Andersson, Björn; Xin, Tao

2018-01-01

In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…
A method of bias correction for maximal reliability with dichotomous measures.

PubMed

Penev, Spiridon; Raykov, Tenko

2010-02-01

This paper is concerned with the reliability of weighted combinations of a given set of dichotomous measures. Maximal reliability for such measures has been discussed in the past, but the pertinent estimator exhibits a considerable bias and mean squared error for moderate sample sizes. We examine this bias, propose a procedure for bias correction, and develop a more accurate asymptotic confidence interval for the resulting estimator. In most empirically relevant cases, the bias correction and mean squared error correction can be performed simultaneously. We propose an approximate (asymptotic) confidence interval for the maximal reliability coefficient, discuss the implementation of this estimator, and investigate the mean squared error of the associated asymptotic approximation. We illustrate the proposed methods using a numerical example.
Divergent estimation error in portfolio optimization and in linear regression

NASA Astrophysics Data System (ADS)

Kondor, I.; Varga-Haszonits, I.

2008-08-01

The problem of estimation error in portfolio optimization is discussed, in the limit where the portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, and displays universality. As the structure of a large number of multidimensional regression and modelling problems is very similar to portfolio optimization, the scope of the above observations extends far beyond finance, and covers a large number of problems in operations research, machine learning, bioinformatics, medical science, economics, and technology.
Seasonal variation in size-dependent survival of juvenile Atlantic salmon (Salmo salar): Performance of multistate capture-mark-recapture models

USGS Publications Warehouse

Letcher, B.H.; Horton, G.E.

2008-01-01

We estimated the magnitude and shape of size-dependent survival (SDS) across multiple sampling intervals for two cohorts of stream-dwelling Atlantic salmon (Salmo salar) juveniles using multistate capture-mark-recapture (CMR) models. Simulations designed to test the effectiveness of multistate models for detecting SDS in our system indicated that error in SDS estimates was low and that both time-invariant and time-varying SDS could be detected with sample sizes of >250, average survival of >0.6, and average probability of capture of >0.6, except for cases of very strong SDS. In the field (N ??? 750, survival 0.6-0.8 among sampling intervals, probability of capture 0.6-0.8 among sampling occasions), about one-third of the sampling intervals showed evidence of SDS, with poorer survival of larger fish during the age-2+ autumn and quadratic survival (opposite direction between cohorts) during age-1+ spring. The varying magnitude and shape of SDS among sampling intervals suggest a potential mechanism for the maintenance of the very wide observed size distributions. Estimating SDS using multistate CMR models appears complementary to established approaches, can provide estimates with low error, and can be used to detect intermittent SDS. ?? 2008 NRC Canada.
Data Combination and Instrumental Variables in Linear Models

ERIC Educational Resources Information Center

Khawand, Christopher

2012-01-01

Instrumental variables (IV) methods allow for consistent estimation of causal effects, but suffer from poor finite-sample properties and data availability constraints. IV estimates also tend to have relatively large standard errors, often inhibiting the interpretability of differences between IV and non-IV point estimates. Lastly, instrumental…
Understanding the dynamics of correct and error responses in free recall: evidence from externalized free recall.

PubMed

Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J

2010-06-01

The dynamics of correct and error responses in a variant of delayed free recall were examined in the present study. In the externalized free recall paradigm, participants were presented with lists of words and were instructed to subsequently recall not only the words that they could remember from the most recently presented list, but also any other words that came to mind during the recall period. Externalized free recall is useful for elucidating both sampling and postretrieval editing processes, thereby yielding more accurate estimates of the total number of error responses, which are typically sampled and subsequently edited during free recall. The results indicated that the participants generally sampled correct items early in the recall period and then transitioned to sampling more erroneous responses. Furthermore, the participants generally terminated their search after sampling too many errors. An examination of editing processes suggested that the participants were quite good at identifying errors, but this varied systematically on the basis of a number of factors. The results from the present study are framed in terms of generate-edit models of free recall.

Unbiased estimation of oceanic mean rainfall from satellite borne radiometer measurements

NASA Technical Reports Server (NTRS)

Mittal, M. C.

1981-01-01

The statistical properties of the radar derived rainfall obtained during the GARP Atlantic Tropical Experiment (GATE) are used to derive quantitative estimates of the spatial and temporal sampling errors associated with estimating rainfall from brightness temperature measurements such as would be obtained from a satelliteborne microwave radiometer employing a practical size antenna aperture. A basis for a method of correcting the so called beam filling problem, i.e., for the effect of nonuniformity of rainfall over the radiometer beamwidth is provided. The method presented employs the statistical properties of the observations themselves without need for physical assumptions beyond those associated with the radiative transfer model. The simulation results presented offer a validation of the estimated accuracy that can be achieved and the graphs included permit evaluation of the effect of the antenna resolution on both the temporal and spatial sampling errors.
Parallel Processing of Broad-Band PPM Signals

NASA Technical Reports Server (NTRS)

Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

2010-01-01

A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).
Evaluation of errors in quantitative determination of asbestos in rock

NASA Astrophysics Data System (ADS)

Baietto, Oliviero; Marini, Paola; Vitaliti, Martina

2016-04-01

The quantitative determination of the content of asbestos in rock matrices is a complex operation which is susceptible to important errors. The principal methodologies for the analysis are Scanning Electron Microscopy (SEM) and Phase Contrast Optical Microscopy (PCOM). Despite the PCOM resolution is inferior to that of SEM, PCOM analysis has several advantages, including more representativity of the analyzed sample, more effective recognition of chrysotile and a lower cost. The DIATI LAA internal methodology for the analysis in PCOM is based on a mild grinding of a rock sample, its subdivision in 5-6 grain size classes smaller than 2 mm and a subsequent microscopic analysis of a portion of each class. The PCOM is based on the optical properties of asbestos and of the liquids with note refractive index in which the particles in analysis are immersed. The error evaluation in the analysis of rock samples, contrary to the analysis of airborne filters, cannot be based on a statistical distribution. In fact for airborne filters a binomial distribution (Poisson), which theoretically defines the variation in the count of fibers resulting from the observation of analysis fields, chosen randomly on the filter, can be applied. The analysis in rock matrices instead cannot lean on any statistical distribution because the most important object of the analysis is the size of the of asbestiform fibers and bundles of fibers observed and the resulting relationship between the weights of the fibrous component compared to the one granular. The error evaluation generally provided by public and private institutions varies between 50 and 150 percent, but there are not, however, specific studies that discuss the origin of the error or that link it to the asbestos content. Our work aims to provide a reliable estimation of the error in relation to the applied methodologies and to the total content of asbestos, especially for the values close to the legal limits. The error assessments must be made through the repetition of the same analysis on the same sample to try to estimate the error on the representativeness of the sample and the error related to the sensitivity of the operator, in order to provide a sufficiently reliable uncertainty of the method. We used about 30 natural rock samples with different asbestos content, performing 3 analysis on each sample to obtain a trend sufficiently representative of the percentage. Furthermore we made on one chosen sample 10 repetition of the analysis to try to define more specifically the error of the methodology.
Evaluation of process errors in bed load sampling using a Dune Model

USGS Publications Warehouse

Gomez, Basil; Troutman, Brent M.

1997-01-01

Reliable estimates of the streamwide bed load discharge obtained using sampling devices are dependent upon good at-a-point knowledge across the full width of the channel. Using field data and information derived from a model that describes the geometric features of a dune train in terms of a spatial process observed at a fixed point in time, we show that sampling errors decrease as the number of samples collected increases, and the number of traverses of the channel over which the samples are collected increases. It also is preferable that bed load sampling be conducted at a pace which allows a number of bed forms to pass through the sampling cross section. The situations we analyze and simulate pertain to moderate transport conditions in small rivers. In such circumstances, bed load sampling schemes typically should involve four or five traverses of a river, and the collection of 20–40 samples at a rate of five or six samples per hour. By ensuring that spatial and temporal variability in the transport process is accounted for, such a sampling design reduces both random and systematic errors and hence minimizes the total error involved in the sampling process.
Random errors of oceanic monthly rainfall derived from SSM/I using probability distribution functions

NASA Technical Reports Server (NTRS)

Chang, Alfred T. C.; Chiu, Long S.; Wilheit, Thomas T.

1993-01-01

Global averages and random errors associated with the monthly oceanic rain rates derived from the Special Sensor Microwave/Imager (SSM/I) data using the technique developed by Wilheit et al. (1991) are computed. Accounting for the beam-filling bias, a global annual average rain rate of 1.26 m is computed. The error estimation scheme is based on the existence of independent (morning and afternoon) estimates of the monthly mean. Calculations show overall random errors of about 50-60 percent for each 5 deg x 5 deg box. The results are insensitive to different sampling strategy (odd and even days of the month). Comparison of the SSM/I estimates with raingage data collected at the Pacific atoll stations showed a low bias of about 8 percent, a correlation of 0.7, and an rms difference of 55 percent.
Estimating genotype error rates from high-coverage next-generation sequence data.

PubMed

Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil

2014-11-01

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.
Multistage estimation of received carrier signal parameters under very high dynamic conditions of the receiver

NASA Technical Reports Server (NTRS)

Kumar, Rajendra (Inventor)

1991-01-01

A multistage estimator is provided for the parameters of a received carrier signal possibly phase-modulated by unknown data and experiencing very high Doppler, Doppler rate, etc., as may arise, for example, in the case of Global Positioning Systems (GPS) where the signal parameters are directly related to the position, velocity and jerk of the GPS ground-based receiver. In a two-stage embodiment of the more general multistage scheme, the first stage, selected to be a modified least squares algorithm referred to as differential least squares (DLS), operates as a coarse estimator resulting in higher rms estimation errors but with a relatively small probability of the frequency estimation error exceeding one-half of the sampling frequency, provides relatively coarse estimates of the frequency and its derivatives. The second stage of the estimator, an extended Kalman filter (EKF), operates on the error signal available from the first stage refining the overall estimates of the phase along with a more refined estimate of frequency as well and in the process also reduces the number of cycle slips.
Multistage estimation of received carrier signal parameters under very high dynamic conditions of the receiver

NASA Technical Reports Server (NTRS)

Kumar, Rajendra (Inventor)

1990-01-01

A multistage estimator is provided for the parameters of a received carrier signal possibly phase-modulated by unknown data and experiencing very high Doppler, Doppler rate, etc., as may arise, for example, in the case of Global Positioning Systems (GPS) where the signal parameters are directly related to the position, velocity and jerk of the GPS ground-based receiver. In a two-stage embodiment of the more general multistage scheme, the first stage, selected to be a modified least squares algorithm referred to as differential least squares (DLS), operates as a coarse estimator resulting in higher rms estimation errors but with a relatively small probability of the frequency estimation error exceeding one-half of the sampling frequency, provides relatively coarse estimates of the frequency and its derivatives. The second stage of the estimator, an extended Kalman filter (EKF), operates on the error signal available from the first stage refining the overall estimates of the phase along with a more refined estimate of frequency as well and in the process also reduces the number of cycle slips.
Static Scene Statistical Non-Uniformity Correction

DTIC Science & Technology

2015-03-01

Error NUC Non-Uniformity Correction RMSE Root Mean Squared Error RSD Relative Standard Deviation S3NUC Static Scene Statistical Non-Uniformity...Deviation ( RSD ) which normalizes the standard deviation, σ, to the mean estimated value, µ using the equation RS D = σ µ × 100. The RSD plot of the gain...estimates is shown in Figure 4.1(b). The RSD plot shows that after a sample size of approximately 10, the different photocount values and the inclusion
Simplified Estimation and Testing in Unbalanced Repeated Measures Designs.

PubMed

Spiess, Martin; Jordan, Pascal; Wendt, Mike

2018-05-07

In this paper we propose a simple estimator for unbalanced repeated measures design models where each unit is observed at least once in each cell of the experimental design. The estimator does not require a model of the error covariance structure. Thus, circularity of the error covariance matrix and estimation of correlation parameters and variances are not necessary. Together with a weak assumption about the reason for the varying number of observations, the proposed estimator and its variance estimator are unbiased. As an alternative to confidence intervals based on the normality assumption, a bias-corrected and accelerated bootstrap technique is considered. We also propose the naive percentile bootstrap for Wald-type tests where the standard Wald test may break down when the number of observations is small relative to the number of parameters to be estimated. In a simulation study we illustrate the properties of the estimator and the bootstrap techniques to calculate confidence intervals and conduct hypothesis tests in small and large samples under normality and non-normality of the errors. The results imply that the simple estimator is only slightly less efficient than an estimator that correctly assumes a block structure of the error correlation matrix, a special case of which is an equi-correlation matrix. Application of the estimator and the bootstrap technique is illustrated using data from a task switch experiment based on an experimental within design with 32 cells and 33 participants.
Evaluation of the predicted error of the soil moisture retrieval from C-band SAR by comparison against modelled soil moisture estimates over Australia

PubMed Central

Doubková, Marcela; Van Dijk, Albert I.J.M.; Sabel, Daniel; Wagner, Wolfgang; Blöschl, Günter

2012-01-01

The Sentinel-1 will carry onboard a C-band radar instrument that will map the European continent once every four days and the global land surface at least once every twelve days with finest 5 × 20 m spatial resolution. The high temporal sampling rate and operational configuration make Sentinel-1 of interest for operational soil moisture monitoring. Currently, updated soil moisture data are made available at 1 km spatial resolution as a demonstration service using Global Mode (GM) measurements from the Advanced Synthetic Aperture Radar (ASAR) onboard ENVISAT. The service demonstrates the potential of the C-band observations to monitor variations in soil moisture. Importantly, a retrieval error estimate is also available; these are needed to assimilate observations into models. The retrieval error is estimated by propagating sensor errors through the retrieval model. In this work, the existing ASAR GM retrieval error product is evaluated using independent top soil moisture estimates produced by the grid-based landscape hydrological model (AWRA-L) developed within the Australian Water Resources Assessment system (AWRA). The ASAR GM retrieval error estimate, an assumed prior AWRA-L error estimate and the variance in the respective datasets were used to spatially predict the root mean square error (RMSE) and the Pearson's correlation coefficient R between the two datasets. These were compared with the RMSE calculated directly from the two datasets. The predicted and computed RMSE showed a very high level of agreement in spatial patterns as well as good quantitative agreement; the RMSE was predicted within accuracy of 4% of saturated soil moisture over 89% of the Australian land mass. Predicted and calculated R maps corresponded within accuracy of 10% over 61% of the continent. The strong correspondence between the predicted and calculated RMSE and R builds confidence in the retrieval error model and derived ASAR GM error estimates. The ASAR GM and Sentinel-1 have the same basic physical measurement characteristics, and therefore very similar retrieval error estimation method can be applied. Because of the expected improvements in radiometric resolution of the Sentinel-1 backscatter measurements, soil moisture estimation errors can be expected to be an order of magnitude less than those for ASAR GM. This opens the possibility for operationally available medium resolution soil moisture estimates with very well-specified errors that can be assimilated into hydrological or crop yield models, with potentially large benefits for land-atmosphere fluxes, crop growth, and water balance monitoring and modelling. PMID:23483015
Linear models for airborne-laser-scanning-based operational forest inventory with small field sample size and highly correlated LiDAR data

USGS Publications Warehouse

Junttila, Virpi; Kauranne, Tuomo; Finley, Andrew O.; Bradford, John B.

2015-01-01

Modern operational forest inventory often uses remotely sensed data that cover the whole inventory area to produce spatially explicit estimates of forest properties through statistical models. The data obtained by airborne light detection and ranging (LiDAR) correlate well with many forest inventory variables, such as the tree height, the timber volume, and the biomass. To construct an accurate model over thousands of hectares, LiDAR data must be supplemented with several hundred field sample measurements of forest inventory variables. This can be costly and time consuming. Different LiDAR-data-based and spatial-data-based sampling designs can reduce the number of field sample plots needed. However, problems arising from the features of the LiDAR data, such as a large number of predictors compared with the sample size (overfitting) or a strong correlation among predictors (multicollinearity), may decrease the accuracy and precision of the estimates and predictions. To overcome these problems, a Bayesian linear model with the singular value decomposition of predictors, combined with regularization, is proposed. The model performance in predicting different forest inventory variables is verified in ten inventory areas from two continents, where the number of field sample plots is reduced using different sampling designs. The results show that, with an appropriate field plot selection strategy and the proposed linear model, the total relative error of the predicted forest inventory variables is only 5%–15% larger using 50 field sample plots than the error of a linear model estimated with several hundred field sample plots when we sum up the error due to both the model noise variance and the model’s lack of fit.
Hazard Function Estimation with Cause-of-Death Data Missing at Random.

PubMed

Wang, Qihua; Dinse, Gregg E; Liu, Chunling

2012-04-01

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.
Effect of Sampling Depth on Air-Sea CO2 Flux Estimates in River-Stratified Arctic Coastal Waters

NASA Astrophysics Data System (ADS)

Miller, L. A.; Papakyriakou, T. N.

2015-12-01

In summer-time Arctic coastal waters that are strongly influenced by river run-off, extreme stratification severely limits wind mixing, making it difficult to effectively sample the surface 'mixed layer', which can be as shallow as 1 m, from a ship. During two expeditions in southwestern Hudson Bay, off the Nelson, Hayes, and Churchill River estuaries, we confirmed that sampling depth has a strong impact on estimates of 'surface' pCO2 and calculated air-sea CO2 fluxes. We determined pCO2 in samples collected from 5 m, using a typical underway system on the ship's seawater supply; from the 'surface' rosette bottle, which was generally between 1 and 3 m; and using a niskin bottle deployed at 1 m and just below the surface from a small boat away from the ship. Our samples confirmed that the error in pCO2 derived from typical ship-board versus small-boat sampling at a single station could be nearly 90 μatm, leading to errors in the calculated air-sea CO2 flux of more than 0.1 mmol/(m2s). Attempting to extrapolate such fluxes over the 6,000,000 km2 area of the Arctic shelves would generate an error approaching a gigamol CO2/s. Averaging the station data over a cruise still resulted in an error of nearly 50% in the total flux estimate. Our results have implications not only for the design and execution of expedition-based sampling, but also for placement of in-situ sensors. Particularly in polar waters, sensors are usually deployed on moorings, well below the surface, to avoid damage and destruction from drifting ice. However, to obtain accurate information on air-sea fluxes in these areas, it is necessary to deploy sensors on ice-capable buoys that can position the sensors in true 'surface' waters.
A Comparison of Normal and Elliptical Estimation Methods in Structural Equation Models.

ERIC Educational Resources Information Center

Schumacker, Randall E.; Cheevatanarak, Suchittra

Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean square error of approximation values using normal and elliptical estimation methods. Three research conditions were imposed on the simulated data: sample size, population contamination percent, and kurtosis. A Bentler-Weeks structural model established the…
The impact of multiple endpoint dependency on Q and I(2) in meta-analysis.

PubMed

Thompson, Christopher Glen; Becker, Betsy Jane

2014-09-01

A common assumption in meta-analysis is that effect sizes are independent. When correlated effect sizes are analyzed using traditional univariate techniques, this assumption is violated. This research assesses the impact of dependence arising from treatment-control studies with multiple endpoints on homogeneity measures Q and I(2) in scenarios using the unbiased standardized-mean-difference effect size. Univariate and multivariate meta-analysis methods are examined. Conditions included different overall outcome effects, study sample sizes, numbers of studies, between-outcomes correlations, dependency structures, and ways of computing the correlation. The univariate approach used typical fixed-effects analyses whereas the multivariate approach used generalized least-squares (GLS) estimates of a fixed-effects model, weighted by the inverse variance-covariance matrix. Increased dependence among effect sizes led to increased Type I error rates from univariate models. When effect sizes were strongly dependent, error rates were drastically higher than nominal levels regardless of study sample size and number of studies. In contrast, using GLS estimation to account for multiple-endpoint dependency maintained error rates within nominal levels. Conversely, mean I(2) values were not greatly affected by increased amounts of dependency. Last, we point out that the between-outcomes correlation should be estimated as a pooled within-groups correlation rather than using a full-sample estimator that does not consider treatment/control group membership. Copyright © 2014 John Wiley & Sons, Ltd.
Sample Size Limits for Estimating Upper Level Mediation Models Using Multilevel SEM

ERIC Educational Resources Information Center

Li, Xin; Beretvas, S. Natasha

2013-01-01

This simulation study investigated use of the multilevel structural equation model (MLSEM) for handling measurement error in both mediator and outcome variables ("M" and "Y") in an upper level multilevel mediation model. Mediation and outcome variable indicators were generated with measurement error. Parameter and standard…
Background Error Covariance Estimation using Information from a Single Model Trajectory with Application to Ocean Data Assimilation into the GEOS-5 Coupled Model

NASA Technical Reports Server (NTRS)

Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume; Koster, Randal D. (Editor)

2014-01-01

An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory. SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.
Background Error Covariance Estimation Using Information from a Single Model Trajectory with Application to Ocean Data Assimilation

NASA Technical Reports Server (NTRS)

Keppenne, Christian L.; Rienecker, Michele; Kovach, Robin M.; Vernieres, Guillaume

2014-01-01

An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory.SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.
Can a sample of Landsat sensor scenes reliably estimate the global extent of tropical deforestation?

Treesearch

R. L. Czaplewski

2003-01-01

Tucker and Townshend (2000) conclude that wall-to-wall coverage is needed to avoid gross errors in estimations of deforestation rates' because tropical deforestation is concentrated along roads and rivers. They specifically question the reliability of the 10% sample of Landsat sensor scenes used in the global remote sensing survey conducted by the Food and...

A three stage sampling model for remote sensing applications

NASA Technical Reports Server (NTRS)

Eisgruber, L. M.

1972-01-01

A conceptual model and an empirical application of the relationship between the manner of selecting observations and its effect on the precision of estimates from remote sensing are reported. This three stage sampling scheme considers flightlines, segments within flightlines, and units within these segments. The error of estimate is dependent on the number of observations in each of the stages.
Judging Statistical Models of Individual Decision Making under Risk Using In- and Out-of-Sample Criteria

PubMed Central

Drichoutis, Andreas C.; Lusk, Jayson L.

2014-01-01

Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample. PMID:25029467
Judging statistical models of individual decision making under risk using in- and out-of-sample criteria.

PubMed

Drichoutis, Andreas C; Lusk, Jayson L

2014-01-01

Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.
Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

USGS Publications Warehouse

Aulenbach, Brent T.

2013-01-01

A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.
Data processing 1: Advancements in machine analysis of multispectral data

NASA Technical Reports Server (NTRS)

Swain, P. H.

1972-01-01

Multispectral data processing procedures are outlined beginning with the data display process used to accomplish data editing and proceeding through clustering, feature selection criterion for error probability estimation, and sample clustering and sample classification. The effective utilization of large quantities of remote sensing data by formulating a three stage sampling model for evaluation of crop acreage estimates represents an improvement in determining the cost benefit relationship associated with remote sensing technology.
Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

PubMed

Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

2012-01-01

Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.
Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

PubMed

Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

2012-01-01

The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Estimation After a Group Sequential Trial.

PubMed

Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Kenward, Michael G; Tsiatis, Anastasios A; Davidian, Marie; Verbeke, Geert

2015-10-01

Group sequential trials are one important instance of studies for which the sample size is not fixed a priori but rather takes one of a finite set of pre-specified values, dependent on the observed data. Much work has been devoted to the inferential consequences of this design feature. Molenberghs et al (2012) and Milanzi et al (2012) reviewed and extended the existing literature, focusing on a collection of seemingly disparate, but related, settings, namely completely random sample sizes, group sequential studies with deterministic and random stopping rules, incomplete data, and random cluster sizes. They showed that the ordinary sample average is a viable option for estimation following a group sequential trial, for a wide class of stopping rules and for random outcomes with a distribution in the exponential family. Their results are somewhat surprising in the sense that the sample average is not optimal, and further, there does not exist an optimal, or even, unbiased linear estimator. However, the sample average is asymptotically unbiased, both conditionally upon the observed sample size as well as marginalized over it. By exploiting ignorability they showed that the sample average is the conventional maximum likelihood estimator. They also showed that a conditional maximum likelihood estimator is finite sample unbiased, but is less efficient than the sample average and has the larger mean squared error. Asymptotically, the sample average and the conditional maximum likelihood estimator are equivalent. This previous work is restricted, however, to the situation in which the the random sample size can take only two values, N = n or N = 2 n . In this paper, we consider the more practically useful setting of sample sizes in a the finite set { n 1 , n 2 , …, n L }. It is shown that the sample average is then a justifiable estimator , in the sense that it follows from joint likelihood estimation, and it is consistent and asymptotically unbiased. We also show why simulations can give the false impression of bias in the sample average when considered conditional upon the sample size. The consequence is that no corrections need to be made to estimators following sequential trials. When small-sample bias is of concern, the conditional likelihood estimator provides a relatively straightforward modification to the sample average. Finally, it is shown that classical likelihood-based standard errors and confidence intervals can be applied, obviating the need for technical corrections.
Estimating means and variances: The comparative efficiency of composite and grab samples.

PubMed

Brumelle, S; Nemetz, P; Casey, D

1984-03-01

This paper compares the efficiencies of two sampling techniques for estimating a population mean and variance. One procedure, called grab sampling, consists of collecting and analyzing one sample per period. The second procedure, called composite sampling, collectsn samples per period which are then pooled and analyzed as a single sample. We review the well known fact that composite sampling provides a superior estimate of the mean. However, it is somewhat surprising that composite sampling does not always generate a more efficient estimate of the variance. For populations with platykurtic distributions, grab sampling gives a more efficient estimate of the variance, whereas composite sampling is better for leptokurtic distributions. These conditions on kurtosis can be related to peakedness and skewness. For example, a necessary condition for composite sampling to provide a more efficient estimate of the variance is that the population density function evaluated at the mean (i.e.f(μ)) be greater than[Formula: see text]. If[Formula: see text], then a grab sample is more efficient. In spite of this result, however, composite sampling does provide a smaller estimate of standard error than does grab sampling in the context of estimating population means.
A comparison of abundance estimates from extended batch-marking and Jolly–Seber-type experiments

PubMed Central

Cowen, Laura L E; Besbeas, Panagiotis; Morgan, Byron J T; Schwarz, Carl J

2014-01-01

Little attention has been paid to the use of multi-sample batch-marking studies, as it is generally assumed that an individual's capture history is necessary for fully efficient estimates. However, recently, Huggins et al. (2010) present a pseudo-likelihood for a multi-sample batch-marking study where they used estimating equations to solve for survival and capture probabilities and then derived abundance estimates using a Horvitz–Thompson-type estimator. We have developed and maximized the likelihood for batch-marking studies. We use data simulated from a Jolly–Seber-type study and convert this to what would have been obtained from an extended batch-marking study. We compare our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz (CMAS) model with those of the extended batch-marking model to determine the efficiency of collecting and analyzing batch-marking data. We found that estimates of abundance were similar for all three estimators: CMAS, Huggins, and our likelihood. Gains are made when using unique identifiers and employing the CMAS model in terms of precision; however, the likelihood typically had lower mean square error than the pseudo-likelihood method of Huggins et al. (2010). When faced with designing a batch-marking study, researchers can be confident in obtaining unbiased abundance estimators. Furthermore, they can design studies in order to reduce mean square error by manipulating capture probabilities and sample size. PMID:24558576
Modeling misidentification errors that result from use of genetic tags in capture-recapture studies

USGS Publications Warehouse

Yoshizaki, J.; Brownie, C.; Pollock, K.H.; Link, W.A.

2011-01-01

Misidentification of animals is potentially important when naturally existing features (natural tags) such as DNA fingerprints (genetic tags) are used to identify individual animals. For example, when misidentification leads to multiple identities being assigned to an animal, traditional estimators tend to overestimate population size. Accounting for misidentification in capture-recapture models requires detailed understanding of the mechanism. Using genetic tags as an example, we outline a framework for modeling the effect of misidentification in closed population studies when individual identification is based on natural tags that are consistent over time (non-evolving natural tags). We first assume a single sample is obtained per animal for each capture event, and then generalize to the case where multiple samples (such as hair or scat samples) are collected per animal per capture occasion. We introduce methods for estimating population size and, using a simulation study, we show that our new estimators perform well for cases with moderately high capture probabilities or high misidentification rates. In contrast, conventional estimators can seriously overestimate population size when errors due to misidentification are ignored. ?? 2009 Springer Science+Business Media, LLC.
Characterizing Air Pollution Exposure Misclassification Errors Using Detailed Cell Phone Location Data

NASA Astrophysics Data System (ADS)

Yu, H.; Russell, A. G.; Mulholland, J. A.

2017-12-01

In air pollution epidemiologic studies with spatially resolved air pollution data, exposures are often estimated using the home locations of individual subjects. Due primarily to lack of data or logistic difficulties, the spatiotemporal mobility of subjects are mostly neglected, which are expected to result in exposure misclassification errors. In this study, we applied detailed cell phone location data to characterize potential exposure misclassification errors associated with home-based exposure estimation of air pollution. The cell phone data sample consists of 9,886 unique simcard IDs collected on one mid-week day in October, 2013 from Shenzhen, China. The Community Multi-scale Air Quality model was used to simulate hourly ambient concentrations of six chosen pollutants at 3 km spatial resolution, which were then fused with observational data to correct for potential modeling biases and errors. Air pollution exposure for each simcard ID was estimated by matching hourly pollutant concentrations with detailed location data for corresponding IDs. Finally, the results were compared with exposure estimates obtained using the home location method to assess potential exposure misclassification errors. Our results show that the home-based method is likely to have substantial exposure misclassification errors, over-estimating exposures for subjects with higher exposure levels and under-estimating exposures for those with lower exposure levels. This has the potential to lead to a bias-to-the-null in the health effect estimates. Our findings suggest that the use of cell phone data has the potential for improving the characterization of exposure and exposure misclassification in air pollution epidemiology studies.
Mutual information estimation for irregularly sampled time series

NASA Astrophysics Data System (ADS)

Rehfeld, K.; Marwan, N.; Heitzig, J.; Kurths, J.

2012-04-01

For the automated, objective and joint analysis of time series, similarity measures are crucial. Used in the analysis of climate records, they allow for a complimentary, unbiased view onto sparse datasets. The irregular sampling of many of these time series, however, makes it necessary to either perform signal reconstruction (e.g. interpolation) or to develop and use adapted measures. Standard linear interpolation comes with an inevitable loss of information and bias effects. We have recently developed a Gaussian kernel-based correlation algorithm with which the interpolation error can be substantially lowered, but this would not work should the functional relationship in a bivariate setting be non-linear. We therefore propose an algorithm to estimate lagged auto and cross mutual information from irregularly sampled time series. We have extended the standard and adaptive binning histogram estimators and use Gaussian distributed weights in the estimation of the (joint) probabilities. To test our method we have simulated linear and nonlinear auto-regressive processes with Gamma-distributed inter-sampling intervals. We have then performed a sensitivity analysis for the estimation of actual coupling length, the lag of coupling and the decorrelation time in the synthetic time series and contrast our results to the performance of a signal reconstruction scheme. Finally we applied our estimator to speleothem records. We compare the estimated memory (or decorrelation time) to that from a least-squares estimator based on fitting an auto-regressive process of order 1. The calculated (cross) mutual information results are compared for the different estimators (standard or adaptive binning) and contrasted with results from signal reconstruction. We find that the kernel-based estimator has a significantly lower root mean square error and less systematic sampling bias than the interpolation-based method. It is possible that these encouraging results could be further improved by using non-histogram mutual information estimators, like k-Nearest Neighbor or Kernel-Density estimators, but for short (<1000 points) and irregularly sampled datasets the proposed algorithm is already a great improvement.
Entropy-Based TOA Estimation and SVM-Based Ranging Error Mitigation in UWB Ranging Systems

PubMed Central

Yin, Zhendong; Cui, Kai; Wu, Zhilu; Yin, Liang

2015-01-01

The major challenges for Ultra-wide Band (UWB) indoor ranging systems are the dense multipath and non-line-of-sight (NLOS) problems of the indoor environment. To precisely estimate the time of arrival (TOA) of the first path (FP) in such a poor environment, a novel approach of entropy-based TOA estimation and support vector machine (SVM) regression-based ranging error mitigation is proposed in this paper. The proposed method can estimate the TOA precisely by measuring the randomness of the received signals and mitigate the ranging error without the recognition of the channel conditions. The entropy is used to measure the randomness of the received signals and the FP can be determined by the decision of the sample which is followed by a great entropy decrease. The SVM regression is employed to perform the ranging-error mitigation by the modeling of the regressor between the characteristics of received signals and the ranging error. The presented numerical simulation results show that the proposed approach achieves significant performance improvements in the CM1 to CM4 channels of the IEEE 802.15.4a standard, as compared to conventional approaches. PMID:26007726
An Empirical Study of Re-sampling Techniques as a Method for Improving Error Estimates in Split-plot Designs

DTIC Science & Technology

2010-03-01

sufficient replications often lead to models that lack precision in error estimation and thus imprecision in corresponding conclusions. This work develops...v Preface This work is dedicated to all who gave and continue to give in order for me to achieve some semblance of success. Benjamin M. Lee vi...develop, examine and test methodologies for an- alyzing test results from split-plot designs. In particular, this work determines the applicability
Error simulation of paired-comparison-based scaling methods

NASA Astrophysics Data System (ADS)

Cui, Chengwu

2000-12-01

Subjective image quality measurement usually resorts to psycho physical scaling. However, it is difficult to evaluate the inherent precision of these scaling methods. Without knowing the potential errors of the measurement, subsequent use of the data can be misleading. In this paper, the errors on scaled values derived form paired comparison based scaling methods are simulated with randomly introduced proportion of choice errors that follow the binomial distribution. Simulation results are given for various combinations of the number of stimuli and the sampling size. The errors are presented in the form of average standard deviation of the scaled values and can be fitted reasonably well with an empirical equation that can be sued for scaling error estimation and measurement design. The simulation proves paired comparison based scaling methods can have large errors on the derived scaled values when the sampling size and the number of stimuli are small. Examples are also given to show the potential errors on actually scaled values of color image prints as measured by the method of paired comparison.
Network Model-Assisted Inference from Respondent-Driven Sampling Data

PubMed Central

Gile, Krista J.; Handcock, Mark S.

2015-01-01

Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328
Network Model-Assisted Inference from Respondent-Driven Sampling Data.

PubMed

Gile, Krista J; Handcock, Mark S

2015-06-01

Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.
Using cell phone location to assess misclassification errors in air pollution exposure estimation.

PubMed

Yu, Haofei; Russell, Armistead; Mulholland, James; Huang, Zhijiong

2018-02-01

Air pollution epidemiologic and health impact studies often rely on home addresses to estimate individual subject's pollution exposure. In this study, we used detailed cell phone location data, the call detail record (CDR), to account for the impact of spatiotemporal subject mobility on estimates of ambient air pollutant exposure. This approach was applied on a sample with 9886 unique simcard IDs in Shenzhen, China, on one mid-week day in October 2013. Hourly ambient concentrations of six chosen pollutants were simulated by the Community Multi-scale Air Quality model fused with observational data, and matched with detailed location data for these IDs. The results were compared with exposure estimates using home addresses to assess potential exposure misclassification errors. We found the misclassifications errors are likely to be substantial when home location alone is applied. The CDR based approach indicates that the home based approach tends to over-estimate exposures for subjects with higher exposure levels and under-estimate exposures for those with lower exposure levels. Our results show that the cell phone location based approach can be used to assess exposure misclassification error and has the potential for improving exposure estimates in air pollution epidemiology studies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Field comparison of body composition techniques: hydrostatic weighing, skinfold thickness, and bioelectric impedance.

PubMed

Kirkendall, D T; Grogan, J W; Bowers, R G

1991-01-01

Body composition and appropriate playing weight are frequently requested by coaches. Numerous methods for estimating these figures are available, and each has its own limitation, be it technical or biological. A comparison of three common methods was made-underwater weighting (H2O, the criterion), skinfold thicknesses (SF), and commercial bioelectrical impedance analysis (BIA). Subjects were 29 professional football players measured by each of the three methods after an overnight fast. Data was collected 10 weeks preceding the players' formal training camp. There was no difference for percentage of weight as fat between SF (15.8%) and H2O (14.2%). Bioelectrical impedance analysis significantly (p < .05) overestimated percent fat (19.2%) compared to H20. Error rates when regressing SF on H2O were favorable, whether expressed for the whole sample (3.04%) or by race (1.78% or 3.56% for whites and blacks, respectively). Regression of BIA on H2O showed an elevated, overall error rate (14.12%) and elevated error rates for whites (11.57%) and blacks (13.81%). Of the two estimates of body composition on a racially mixed sample of males, SF provided the best estimate with the least amount of error. J Orthop Sports Phys Ther 1991;13(5):235-239.

Precipitation and Latent Heating Distributions from Satellite Passive Microwave Radiometry. Part II: Evaluation of Estimates Using Independent Data

NASA Technical Reports Server (NTRS)

Yang, Song; Olson, William S.; Wang, Jian-Jian; Bell, Thomas L.; Smith, Eric A.; Kummerow, Christian D.

2006-01-01

Rainfall rate estimates from spaceborne microwave radiometers are generally accepted as reliable by a majority of the atmospheric science community. One of the Tropical Rainfall Measuring Mission (TRMM) facility rain-rate algorithms is based upon passive microwave observations from the TRMM Microwave Imager (TMI). In Part I of this series, improvements of the TMI algorithm that are required to introduce latent heating as an additional algorithm product are described. Here, estimates of surface rain rate, convective proportion, and latent heating are evaluated using independent ground-based estimates and satellite products. Instantaneous, 0.5 deg. -resolution estimates of surface rain rate over ocean from the improved TMI algorithm are well correlated with independent radar estimates (r approx. 0.88 over the Tropics), but bias reduction is the most significant improvement over earlier algorithms. The bias reduction is attributed to the greater breadth of cloud-resolving model simulations that support the improved algorithm and the more consistent and specific convective/stratiform rain separation method utilized. The bias of monthly 2.5 -resolution estimates is similarly reduced, with comparable correlations to radar estimates. Although the amount of independent latent heating data is limited, TMI-estimated latent heating profiles compare favorably with instantaneous estimates based upon dual-Doppler radar observations, and time series of surface rain-rate and heating profiles are generally consistent with those derived from rawinsonde analyses. Still, some biases in profile shape are evident, and these may be resolved with (a) additional contextual information brought to the estimation problem and/or (b) physically consistent and representative databases supporting the algorithm. A model of the random error in instantaneous 0.5 deg. -resolution rain-rate estimates appears to be consistent with the levels of error determined from TMI comparisons with collocated radar. Error model modifications for nonraining situations will be required, however. Sampling error represents only a portion of the total error in monthly 2.5 -resolution TMI estimates; the remaining error is attributed to random and systematic algorithm errors arising from the physical inconsistency and/or nonrepresentativeness of cloud-resolving-model-simulated profiles that support the algorithm.
Random and systematic sampling error when hooking fish to monitor skin fluke (Benedenia seriolae) and gill fluke (Zeuxapta seriolae) burden in Australian farmed yellowtail kingfish (Seriola lalandi).

PubMed

Fensham, J R; Bubner, E; D'Antignana, T; Landos, M; Caraguel, C G B

2018-05-01

The Australian farmed yellowtail kingfish (Seriola lalandi, YTK) industry monitor skin fluke (Benedenia seriolae) and gill fluke (Zeuxapta seriolae) burden by pooling the fluke count of 10 hooked YTK. The random and systematic error of this sampling strategy was evaluated to assess potential impact on treatment decisions. Fluke abundance (fluke count per fish) in a study cage (estimated 30,502 fish) was assessed five times using the current sampling protocol and its repeatability was estimated the repeatability coefficient (CR) and the coefficient of variation (CV). Individual body weight, fork length, fluke abundance, prevalence, intensity (fluke count per infested fish) and density (fluke count per Kg of fish) were compared between 100 hooked and 100 seined YTK (assumed representative of the entire population) to estimate potential selection bias. Depending on the fluke species and age category, CR (expected difference in parasite count between 2 sampling iterations) ranged from 0.78 to 114 flukes per fish. Capturing YTK by hooking increased the selection of fish of a weight and length in the lowest 5th percentile of the cage (RR = 5.75, 95% CI: 2.06-16.03, P-value = 0.0001). These lower end YTK had on average an extra 31 juveniles and 6 adults Z. seriolae per Kg of fish and an extra 3 juvenile and 0.4 adult B. seriolae per Kg of fish, compared to the rest of the cage population (P-value < 0.05). Hooking YTK on the edge of the study cage biases sampling towards the smallest and most heavily infested fish in the population, resulting in poor repeatability (more variability amongst sampled fish) and an overestimation of parasite burden in the population. In this particular commercial situation these finding supported that health management program, where the finding of an underestimation of parasite burden could provide a production impact on the study population. In instances where fish populations and parasite burdens are more homogenous, sampling error may be less severe. Sampling error when capturing fish from sea cage is difficult to predict. The amplitude and direction of this error should be investigated for a given cultured fish species across a range of parasite burden and fish profile scenarios. Copyright © 2018 Elsevier B.V. All rights reserved.
Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

NASA Astrophysics Data System (ADS)

Caimmi, R.

2011-08-01

Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both heteroscedastic and homoscedastic data. Conversely, samples related to different methods produce discrepant results, due to the presence of (still undetected) systematic errors, which implies no definitive statement can be made at present. A comparison is also made between different expressions of regression line slope and intercept variance estimators, where fractional discrepancies are found to be not exceeding a few percent, which grows up to about 20% in the presence of large dispersion data. An extension of the formalism to structural models is left to a forthcoming paper.
Sample Size for Estimation of G and Phi Coefficients in Generalizability Theory

ERIC Educational Resources Information Center

Atilgan, Hakan

2013-01-01

Problem Statement: Reliability, which refers to the degree to which measurement results are free from measurement errors, as well as its estimation, is an important issue in psychometrics. Several methods for estimating reliability have been suggested by various theories in the field of psychometrics. One of these theories is the generalizability…
The role of misclassification in estimating proportions and an estimator of misclassification probability

Treesearch

Patrick L. Zimmerman; Greg C. Liknes

2010-01-01

Dot grids are often used to estimate the proportion of land cover belonging to some class in an aerial photograph. Interpreter misclassification is an often-ignored source of error in dot-grid sampling that has the potential to significantly bias proportion estimates. For the case when the true class of items is unknown, we present a maximum-likelihood estimator of...
Sample selection in foreign similarity regions for multicrop experiments

NASA Technical Reports Server (NTRS)

Malin, J. T. (Principal Investigator)

1981-01-01

The selection of sample segments in the U.S. foreign similarity regions for development of proportion estimation procedures and error modeling for Argentina, Australia, Brazil, and USSR in AgRISTARS is described. Each sample was chosen to be similar in crop mix to the corresponding indicator region sample. Data sets, methods of selection, and resulting samples are discussed.
Hazard Function Estimation with Cause-of-Death Data Missing at Random

PubMed Central

Wang, Qihua; Dinse, Gregg E.; Liu, Chunling

2010-01-01

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data. PMID:22267874
Observer Error when Measuring Safety-Related Behavior: Momentary Time Sampling versus Whole-Interval Recording

ERIC Educational Resources Information Center

Taylor, Matthew A.; Skourides, Andreas; Alvero, Alicia M.

2012-01-01

Interval recording procedures are used by persons who collect data through observation to estimate the cumulative occurrence and nonoccurrence of behavior/events. Although interval recording procedures can increase the efficiency of observational data collection, they can also induce error from the observer. In the present study, 50 observers were…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, N.S.V.

The classical Nadaraya-Watson estimator is shown to solve a generic sensor fusion problem where the underlying sensor error densities are not known but a sample is available. By employing Haar kernels this estimator is shown to yield finite sample guarantees and also to be efficiently computable. Two simulation examples, and a robotics example involving the detection of a door using arrays of ultrasonic and infrared sensors, are presented to illustrate the performance.
Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

ERIC Educational Resources Information Center

Michaelides, Michalis P.; Haertel, Edward H.

2014-01-01

The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…
Weighting by Inverse Variance or by Sample Size in Random-Effects Meta-Analysis

ERIC Educational Resources Information Center

Marin-Martinez, Fulgencio; Sanchez-Meca, Julio

2010-01-01

Most of the statistical procedures in meta-analysis are based on the estimation of average effect sizes from a set of primary studies. The optimal weight for averaging a set of independent effect sizes is the inverse variance of each effect size, but in practice these weights have to be estimated, being affected by sampling error. When assuming a…
The effects of sampling frequency on the climate statistics of the European Centre for Medium-Range Weather Forecasts

NASA Astrophysics Data System (ADS)

Phillips, Thomas J.; Gates, W. Lawrence; Arpe, Klaus

1992-12-01

The effects of sampling frequency on the first- and second-moment statistics of selected European Centre for Medium-Range Weather Forecasts (ECMWF) model variables are investigated in a simulation of "perpetual July" with a diurnal cycle included and with surface and atmospheric fields saved at hourly intervals. The shortest characteristic time scales (as determined by the e-folding time of lagged autocorrelation functions) are those of ground heat fluxes and temperatures, precipitation and runoff, convective processes, cloud properties, and atmospheric vertical motion, while the longest time scales are exhibited by soil temperature and moisture, surface pressure, and atmospheric specific humidity, temperature, and wind. The time scales of surface heat and momentum fluxes and of convective processes are substantially shorter over land than over oceans. An appropriate sampling frequency for each model variable is obtained by comparing the estimates of first- and second-moment statistics determined at intervals ranging from 2 to 24 hours with the "best" estimates obtained from hourly sampling. Relatively accurate estimation of first- and second-moment climate statistics (10% errors in means, 20% errors in variances) can be achieved by sampling a model variable at intervals that usually are longer than the bandwidth of its time series but that often are shorter than its characteristic time scale. For the surface variables, sampling at intervals that are nonintegral divisors of a 24-hour day yields relatively more accurate time-mean statistics because of a reduction in errors associated with aliasing of the diurnal cycle and higher-frequency harmonics. The superior estimates of first-moment statistics are accompanied by inferior estimates of the variance of the daily means due to the presence of systematic biases, but these probably can be avoided by defining a different measure of low-frequency variability. Estimates of the intradiurnal variance of accumulated precipitation and surface runoff also are strongly impacted by the length of the storage interval. In light of these results, several alternative strategies for storage of the EMWF model variables are recommended.
Verification of unfold error estimates in the unfold operator code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fehl, D.L.; Biggs, F.

Spectral unfolding is an inverse mathematical operation that attempts to obtain spectral source information from a set of response functions and data measurements. Several unfold algorithms have appeared over the past 30 years; among them is the unfold operator (UFO) code written at Sandia National Laboratories. In addition to an unfolded spectrum, the UFO code also estimates the unfold uncertainty (error) induced by estimated random uncertainties in the data. In UFO the unfold uncertainty is obtained from the error matrix. This built-in estimate has now been compared to error estimates obtained by running the code in a Monte Carlo fashionmore » with prescribed data distributions (Gaussian deviates). In the test problem studied, data were simulated from an arbitrarily chosen blackbody spectrum (10 keV) and a set of overlapping response functions. The data were assumed to have an imprecision of 5{percent} (standard deviation). One hundred random data sets were generated. The built-in estimate of unfold uncertainty agreed with the Monte Carlo estimate to within the statistical resolution of this relatively small sample size (95{percent} confidence level). A possible 10{percent} bias between the two methods was unresolved. The Monte Carlo technique is also useful in underdetermined problems, for which the error matrix method does not apply. UFO has been applied to the diagnosis of low energy x rays emitted by Z-pinch and ion-beam driven hohlraums. {copyright} {ital 1997 American Institute of Physics.}« less
A 2 × 2 taxonomy of multilevel latent contextual models: accuracy-bias trade-offs in full and partial error correction models.

PubMed

Lüdtke, Oliver; Marsh, Herbert W; Robitzsch, Alexander; Trautwein, Ulrich

2011-12-01

In multilevel modeling, group-level variables (L2) for assessing contextual effects are frequently generated by aggregating variables from a lower level (L1). A major problem of contextual analyses in the social sciences is that there is no error-free measurement of constructs. In the present article, 2 types of error occurring in multilevel data when estimating contextual effects are distinguished: unreliability that is due to measurement error and unreliability that is due to sampling error. The fact that studies may or may not correct for these 2 types of error can be translated into a 2 × 2 taxonomy of multilevel latent contextual models comprising 4 approaches: an uncorrected approach, partial correction approaches correcting for either measurement or sampling error (but not both), and a full correction approach that adjusts for both sources of error. It is shown mathematically and with simulated data that the uncorrected and partial correction approaches can result in substantially biased estimates of contextual effects, depending on the number of L1 individuals per group, the number of groups, the intraclass correlation, the number of indicators, and the size of the factor loadings. However, the simulation study also shows that partial correction approaches can outperform full correction approaches when the data provide only limited information in terms of the L2 construct (i.e., small number of groups, low intraclass correlation). A real-data application from educational psychology is used to illustrate the different approaches.
Stochastic Residual-Error Analysis For Estimating Hydrologic Model Predictive Uncertainty

EPA Science Inventory

A hybrid time series-nonparametric sampling approach, referred to herein as semiparametric, is presented for the estimation of model predictive uncertainty. The methodology is a two-step procedure whereby a distributed hydrologic model is first calibrated, then followed by brute ...
Comparison of optimal design methods in inverse problems

NASA Astrophysics Data System (ADS)

Banks, H. T.; Holm, K.; Kappel, F.

2011-07-01

Typical optimal design methods for inverse or parameter estimation problems are designed to choose optimal sampling distributions through minimization of a specific cost function related to the resulting error in parameter estimates. It is hoped that the inverse problem will produce parameter estimates with increased accuracy using data collected according to the optimal sampling distribution. Here we formulate the classical optimal design problem in the context of general optimization problems over distributions of sampling times. We present a new Prohorov metric-based theoretical framework that permits one to treat succinctly and rigorously any optimal design criteria based on the Fisher information matrix. A fundamental approximation theory is also included in this framework. A new optimal design, SE-optimal design (standard error optimal design), is then introduced in the context of this framework. We compare this new design criterion with the more traditional D-optimal and E-optimal designs. The optimal sampling distributions from each design are used to compute and compare standard errors; the standard errors for parameters are computed using asymptotic theory or bootstrapping and the optimal mesh. We use three examples to illustrate ideas: the Verhulst-Pearl logistic population model (Banks H T and Tran H T 2009 Mathematical and Experimental Modeling of Physical and Biological Processes (Boca Raton, FL: Chapman and Hall/CRC)), the standard harmonic oscillator model (Banks H T and Tran H T 2009) and a popular glucose regulation model (Bergman R N, Ider Y Z, Bowden C R and Cobelli C 1979 Am. J. Physiol. 236 E667-77 De Gaetano A and Arino O 2000 J. Math. Biol. 40 136-68 Toffolo G, Bergman R N, Finegood D T, Bowden C R and Cobelli C 1980 Diabetes 29 979-90).
Sampling design for groundwater solute transport: Tests of methods and analysis of Cape Cod tracer test data

USGS Publications Warehouse

Knopman, Debra S.; Voss, Clifford I.; Garabedian, Stephen P.

1991-01-01

Tests of a one-dimensional sampling design methodology on measurements of bromide concentration collected during the natural gradient tracer test conducted by the U.S. Geological Survey on Cape Cod, Massachusetts, demonstrate its efficacy for field studies of solute transport in groundwater and the utility of one-dimensional analysis. The methodology was applied to design of sparse two-dimensional networks of fully screened wells typical of those often used in engineering practice. In one-dimensional analysis, designs consist of the downstream distances to rows of wells oriented perpendicular to the groundwater flow direction and the timing of sampling to be carried out on each row. The power of a sampling design is measured by its effectiveness in simultaneously meeting objectives of model discrimination, parameter estimation, and cost minimization. One-dimensional models of solute transport, differing in processes affecting the solute and assumptions about the structure of the flow field, were considered for description of tracer cloud migration. When fitting each model using nonlinear regression, additive and multiplicative error forms were allowed for the residuals which consist of both random and model errors. The one-dimensional single-layer model of a nonreactive solute with multiplicative error was judged to be the best of those tested. Results show the efficacy of the methodology in designing sparse but powerful sampling networks. Designs that sample five rows of wells at five or fewer times in any given row performed as well for model discrimination as the full set of samples taken up to eight times in a given row from as many as 89 rows. Also, designs for parameter estimation judged to be good by the methodology were as effective in reducing the variance of parameter estimates as arbitrary designs with many more samples. Results further showed that estimates of velocity and longitudinal dispersivity in one-dimensional models based on data from only five rows of fully screened wells each sampled five or fewer times were practically equivalent to values determined from moments analysis of the complete three-dimensional set of 29,285 samples taken during 16 sampling times.
Comparison of estimators of standard deviation for hydrologic time series

USGS Publications Warehouse

Tasker, Gary D.; Gilroy, Edward J.

1982-01-01

Unbiasing factors as a function of serial correlation, ρ, and sample size, n for the sample standard deviation of a lag one autoregressive model were generated by random number simulation. Monte Carlo experiments were used to compare the performance of several alternative methods for estimating the standard deviation σ of a lag one autoregressive model in terms of bias, root mean square error, probability of underestimation, and expected opportunity design loss. Three methods provided estimates of σ which were much less biased but had greater mean square errors than the usual estimate of σ: s = (1/(n - 1) ∑ (xi −x¯)2)½. The three methods may be briefly characterized as (1) a method using a maximum likelihood estimate of the unbiasing factor, (2) a method using an empirical Bayes estimate of the unbiasing factor, and (3) a robust nonparametric estimate of σ suggested by Quenouille. Because s tends to underestimate σ, its use as an estimate of a model parameter results in a tendency to underdesign. If underdesign losses are considered more serious than overdesign losses, then the choice of one of the less biased methods may be wise.
Amelogenin test: From forensics to quality control in clinical and biochemical genomics.

PubMed

Francès, F; Portolés, O; González, J I; Coltell, O; Verdú, F; Castelló, A; Corella, D

2007-01-01

The increasing number of samples from the biomedical genetic studies and the number of centers participating in the same involves increasing risk of mistakes in the different sample handling stages. We have evaluated the usefulness of the amelogenin test for quality control in sample identification. Amelogenin test (frequently used in forensics) was undertaken on 1224 individuals participating in a biomedical study. Concordance between referred sex in the database and amelogenin test was estimated. Additional sex-error genetic detecting systems were developed. The overall concordance rate was 99.84% (1222/1224). Two samples showed a female amelogenin test outcome, being codified as males in the database. The first, after checking sex-specific biochemical and clinical profile data was found to be due to a codification error in the database. In the second, after checking the database, no apparent error was discovered because a correct male profile was found. False negatives in amelogenin male sex determination were discarded by additional tests, and feminine sex was confirmed. A sample labeling error was revealed after a new DNA extraction. The amelogenin test is a useful quality control tool for detecting sex-identification errors in large genomic studies, and can contribute to increase its validity.
Adaptive framework to better characterize errors of apriori fluxes and observational residuals in a Bayesian setup for the urban flux inversions.

NASA Astrophysics Data System (ADS)

Ghosh, S.; Lopez-Coto, I.; Prasad, K.; Karion, A.; Mueller, K.; Gourdji, S.; Martin, C.; Whetstone, J. R.

2017-12-01

The National Institute of Standards and Technology (NIST) supports the North-East Corridor Baltimore Washington (NEC-B/W) project and Indianapolis Flux Experiment (INFLUX) aiming to quantify sources of Greenhouse Gas (GHG) emissions as well as their uncertainties. These projects employ different flux estimation methods including top-down inversion approaches. The traditional Bayesian inversion method estimates emission distributions by updating prior information using atmospheric observations of Green House Gases (GHG) coupled to an atmospheric and dispersion model. The magnitude of the update is dependent upon the observed enhancement along with the assumed errors such as those associated with prior information and the atmospheric transport and dispersion model. These errors are specified within the inversion covariance matrices. The assumed structure and magnitude of the specified errors can have large impact on the emission estimates from the inversion. The main objective of this work is to build a data-adaptive model for these covariances matrices. We construct a synthetic data experiment using a Kalman Filter inversion framework (Lopez et al., 2017) employing different configurations of transport and dispersion model and an assumed prior. Unlike previous traditional Bayesian approaches, we estimate posterior emissions using regularized sample covariance matrices associated with prior errors to investigate whether the structure of the matrices help to better recover our hypothetical true emissions. To incorporate transport model error, we use ensemble of transport models combined with space-time analytical covariance to construct a covariance that accounts for errors in space and time. A Kalman Filter is then run using these covariances along with Maximum Likelihood Estimates (MLE) of the involved parameters. Preliminary results indicate that specifying sptio-temporally varying errors in the error covariances can improve the flux estimates and uncertainties. We also demonstrate that differences between the modeled and observed meteorology can be used to predict uncertainties associated with atmospheric transport and dispersion modeling which can help improve the skill of an inversion at urban scales.

Region of influence regression for estimating the 50-year flood at ungaged sites

USGS Publications Warehouse

Tasker, Gary D.; Hodge, S.A.; Barks, C.S.

1996-01-01

Five methods of developing regional regression models to estimate flood characteristics at ungaged sites in Arkansas are examined. The methods differ in the manner in which the State is divided into subrogions. Each successive method (A to E) is computationally more complex than the previous method. Method A makes no subdivision. Methods B and C define two and four geographic subrogions, respectively. Method D uses cluster/discriminant analysis to define subrogions on the basis of similarities in watershed characteristics. Method E, the new region of influence method, defines a unique subregion for each ungaged site. Split-sample results indicate that, in terms of root-mean-square error, method E (38 percent error) is best. Methods C and D (42 and 41 percent error) were in a virtual tie for second, and methods B (44 percent error) and A (49 percent error) were fourth and fifth best.
Estimating Rain Rates from Tipping-Bucket Rain Gauge Measurements

NASA Technical Reports Server (NTRS)

Wang, Jianxin; Fisher, Brad L.; Wolff, David B.

2007-01-01

This paper describes the cubic spline based operational system for the generation of the TRMM one-minute rain rate product 2A-56 from Tipping Bucket (TB) gauge measurements. Methodological issues associated with applying the cubic spline to the TB gauge rain rate estimation are closely examined. A simulated TB gauge from a Joss-Waldvogel (JW) disdrometer is employed to evaluate effects of time scales and rain event definitions on errors of the rain rate estimation. The comparison between rain rates measured from the JW disdrometer and those estimated from the simulated TB gauge shows good overall agreement; however, the TB gauge suffers sampling problems, resulting in errors in the rain rate estimation. These errors are very sensitive to the time scale of rain rates. One-minute rain rates suffer substantial errors, especially at low rain rates. When one minute rain rates are averaged to 4-7 minute or longer time scales, the errors dramatically reduce. The rain event duration is very sensitive to the event definition but the event rain total is rather insensitive, provided that the events with less than 1 millimeter rain totals are excluded. Estimated lower rain rates are sensitive to the event definition whereas the higher rates are not. The median relative absolute errors are about 22% and 32% for 1-minute TB rain rates higher and lower than 3 mm per hour, respectively. These errors decrease to 5% and 14% when TB rain rates are used at 7-minute scale. The radar reflectivity-rainrate (Ze-R) distributions drawn from large amount of 7-minute TB rain rates and radar reflectivity data are mostly insensitive to the event definition.
Simulating and assessing boson sampling experiments with phase-space representations

NASA Astrophysics Data System (ADS)

Opanchuk, Bogdan; Rosales-Zárate, Laura; Reid, Margaret D.; Drummond, Peter D.

2018-04-01

The search for new, application-specific quantum computers designed to outperform any classical computer is driven by the ending of Moore's law and the quantum advantages potentially obtainable. Photonic networks are promising examples, with experimental demonstrations and potential for obtaining a quantum computer to solve problems believed classically impossible. This introduces a challenge: how does one design or understand such photonic networks? One must be able to calculate observables using general methods capable of treating arbitrary inputs, dissipation, and noise. We develop complex phase-space software for simulating these photonic networks, and apply this to boson sampling experiments. Our techniques give sampling errors orders of magnitude lower than experimental correlation measurements for the same number of samples. We show that these techniques remove systematic errors in previous algorithms for estimating correlations, with large improvements in errors in some cases. In addition, we obtain a scalable channel-combination strategy for assessment of boson sampling devices.
How Large Should a Statistical Sample Be?

ERIC Educational Resources Information Center

Menil, Violeta C.; Ye, Ruili

2012-01-01

This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…
Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System.

PubMed

Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

2016-01-01

Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP).
Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System

PubMed Central

Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

2016-01-01

Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP). PMID:26829639
Reconstruction of regional mean temperature for East Asia since 1900s and its uncertainties

NASA Astrophysics Data System (ADS)

Hua, W.

2017-12-01

Regional average surface air temperature (SAT) is one of the key variables often used to investigate climate change. Unfortunately, because of the limited observations over East Asia, there were also some gaps in the observation data sampling for regional mean SAT analysis, which was important to estimate past climate change. In this study, the regional average temperature of East Asia since 1900s is calculated by the Empirical Orthogonal Function (EOF)-based optimal interpolation (OA) method with considering the data errors. The results show that our estimate is more precise and robust than the results from simple average, which provides a better way for past climate reconstruction. In addition to the reconstructed regional average SAT anomaly time series, we also estimated uncertainties of reconstruction. The root mean square error (RMSE) results show that the the error decreases with respect to time, and are not sufficiently large to alter the conclusions on the persist warming in East Asia during twenty-first century. Moreover, the test of influence of data error on reconstruction clearly shows the sensitivity of reconstruction to the size of the data error.
A method to estimate the effect of deformable image registration uncertainties on daily dose mapping

PubMed Central

Murphy, Martin J.; Salguero, Francisco J.; Siebers, Jeffrey V.; Staub, David; Vaman, Constantin

2012-01-01

Purpose: To develop a statistical sampling procedure for spatially-correlated uncertainties in deformable image registration and then use it to demonstrate their effect on daily dose mapping. Methods: Sequential daily CT studies are acquired to map anatomical variations prior to fractionated external beam radiotherapy. The CTs are deformably registered to the planning CT to obtain displacement vector fields (DVFs). The DVFs are used to accumulate the dose delivered each day onto the planning CT. Each DVF has spatially-correlated uncertainties associated with it. Principal components analysis (PCA) is applied to measured DVF error maps to produce decorrelated principal component modes of the errors. The modes are sampled independently and reconstructed to produce synthetic registration error maps. The synthetic error maps are convolved with dose mapped via deformable registration to model the resulting uncertainty in the dose mapping. The results are compared to the dose mapping uncertainty that would result from uncorrelated DVF errors that vary randomly from voxel to voxel. Results: The error sampling method is shown to produce synthetic DVF error maps that are statistically indistinguishable from the observed error maps. Spatially-correlated DVF uncertainties modeled by our procedure produce patterns of dose mapping error that are different from that due to randomly distributed uncertainties. Conclusions: Deformable image registration uncertainties have complex spatial distributions. The authors have developed and tested a method to decorrelate the spatial uncertainties and make statistical samples of highly correlated error maps. The sample error maps can be used to investigate the effect of DVF uncertainties on daily dose mapping via deformable image registration. An initial demonstration of this methodology shows that dose mapping uncertainties can be sensitive to spatial patterns in the DVF uncertainties. PMID:22320766
Trans-dimensional matched-field geoacoustic inversion with hierarchical error models and interacting Markov chains.

PubMed

Dettmer, Jan; Dosso, Stan E

2012-10-01

This paper develops a trans-dimensional approach to matched-field geoacoustic inversion, including interacting Markov chains to improve efficiency and an autoregressive model to account for correlated errors. The trans-dimensional approach and hierarchical seabed model allows inversion without assuming any particular parametrization by relaxing model specification to a range of plausible seabed models (e.g., in this case, the number of sediment layers is an unknown parameter). Data errors are addressed by sampling statistical error-distribution parameters, including correlated errors (covariance), by applying a hierarchical autoregressive error model. The well-known difficulty of low acceptance rates for trans-dimensional jumps is addressed with interacting Markov chains, resulting in a substantial increase in efficiency. The trans-dimensional seabed model and the hierarchical error model relax the degree of prior assumptions required in the inversion, resulting in substantially improved (more realistic) uncertainty estimates and a more automated algorithm. In particular, the approach gives seabed parameter uncertainty estimates that account for uncertainty due to prior model choice (layering and data error statistics). The approach is applied to data measured on a vertical array in the Mediterranean Sea.
Sample Errors Call Into Question Conclusions Regarding Same-Sex Married Parents: A Comment on "Family Structure and Child Health: Does the Sex Composition of Parents Matter?"

PubMed

Paul Sullins, D

2017-12-01

Because of classification errors reported by the National Center for Health Statistics, an estimated 42 % of the same-sex married partners in the sample for this study are misclassified different-sex married partners, thus calling into question findings regarding same-sex married parents. Including biological parentage as a control variable suppresses same-sex/different-sex differences, thus obscuring the data error. Parentage is not appropriate as a control because it correlates nearly perfectly (+.97, gamma) with the same-sex/different-sex distinction and is invariant for the category of joint biological parents.
Pharmacokinetic design optimization in children and estimation of maturation parameters: example of cytochrome P450 3A4.

PubMed

Bouillon-Pichault, Marion; Jullien, Vincent; Bazzoli, Caroline; Pons, Gérard; Tod, Michel

2011-02-01

The aim of this work was to determine whether optimizing the study design in terms of ages and sampling times for a drug eliminated solely via cytochrome P450 3A4 (CYP3A4) would allow us to accurately estimate the pharmacokinetic parameters throughout the entire childhood timespan, while taking into account age- and weight-related changes. A linear monocompartmental model with first-order absorption was used successively with three different residual error models and previously published pharmacokinetic parameters ("true values"). The optimal ages were established by D-optimization using the CYP3A4 maturation function to create "optimized demographic databases." The post-dose times for each previously selected age were determined by D-optimization using the pharmacokinetic model to create "optimized sparse sampling databases." We simulated concentrations by applying the population pharmacokinetic model to the optimized sparse sampling databases to create optimized concentration databases. The latter were modeled to estimate population pharmacokinetic parameters. We then compared true and estimated parameter values. The established optimal design comprised four age ranges: 0.008 years old (i.e., around 3 days), 0.192 years old (i.e., around 2 months), 1.325 years old, and adults, with the same number of subjects per group and three or four samples per subject, in accordance with the error model. The population pharmacokinetic parameters that we estimated with this design were precise and unbiased (root mean square error [RMSE] and mean prediction error [MPE] less than 11% for clearance and distribution volume and less than 18% for k(a)), whereas the maturation parameters were unbiased but less precise (MPE < 6% and RMSE < 37%). Based on our results, taking growth and maturation into account a priori in a pediatric pharmacokinetic study is theoretically feasible. However, it requires that very early ages be included in studies, which may present an obstacle to the use of this approach. First-pass effects, alternative elimination routes, and combined elimination pathways should also be investigated.
Directional variance adjustment: bias reduction in covariance matrices based on factor analysis with an application to portfolio optimization.

PubMed

Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W; Müller, Klaus-Robert; Lemm, Steven

2013-01-01

Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation.
Directional Variance Adjustment: Bias Reduction in Covariance Matrices Based on Factor Analysis with an Application to Portfolio Optimization

PubMed Central

Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W.; Müller, Klaus-Robert; Lemm, Steven

2013-01-01

Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation. PMID:23844016
Estimating pore and cement volumes in thin section

USGS Publications Warehouse

Halley, R.B.

1978-01-01

Point count estimates of pore, grain and cement volumes from thin sections are inaccurate, often by more than 100 percent, even though they may be surprisingly precise (reproducibility + or - 3 percent). Errors are produced by: 1) inclusion of submicroscopic pore space within solid volume and 2) edge effects caused by grain curvature within a 30-micron thick thin section. Submicroscopic porosity may be measured by various physical tests or may be visually estimated from scanning electron micrographs. Edge error takes the form of an envelope around grains and increases with decreasing grain size and sorting, increasing grain irregularity and tighter grain packing. Cements are greatly involved in edge error because of their position at grain peripheries and their generally small grain size. Edge error is minimized by methods which reduce the thickness of the sample viewed during point counting. Methods which effectively reduce thickness include use of ultra-thin thin sections or acetate peels, point counting in reflected light, or carefully focusing and counting on the upper surface of the thin section.
Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

NASA Astrophysics Data System (ADS)

Reis, D. S.; Stedinger, J. R.; Martins, E. S.

2005-10-01

This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
Systematic evaluation of NASA precipitation radar estimates using NOAA/NSSL National Mosaic QPE products

NASA Astrophysics Data System (ADS)

Kirstetter, P.; Hong, Y.; Gourley, J. J.; Chen, S.; Flamig, Z.; Zhang, J.; Howard, K.; Petersen, W. A.

2011-12-01

Proper characterization of the error structure of TRMM Precipitation Radar (PR) quantitative precipitation estimation (QPE) is needed for their use in TRMM combined products, water budget studies and hydrological modeling applications. Due to the variety of sources of error in spaceborne radar QPE (attenuation of the radar signal, influence of land surface, impact of off-nadir viewing angle, etc.) and the impact of correction algorithms, the problem is addressed by comparison of PR QPEs with reference values derived from ground-based measurements (GV) using NOAA/NSSL's National Mosaic QPE (NMQ) system. An investigation of this subject has been carried out at the PR estimation scale (instantaneous and 5 km) on the basis of a 3-month-long data sample. A significant effort has been carried out to derive a bias-corrected, robust reference rainfall source from NMQ. The GV processing details will be presented along with preliminary results of PR's error characteristics using contingency table statistics, probability distribution comparisons, scatter plots, semi-variograms, and systematic biases and random errors.
Quantification of sewer system infiltration using delta(18)O hydrograph separation.

PubMed

Prigiobbe, V; Giulianelli, M

2009-01-01

The infiltration of parasitical water into two sewer systems in Rome (Italy) was quantified during a dry weather period. Infiltration was estimated using the hydrograph separation method with two water components and delta(18)O as a conservative tracer. The two water components were groundwater, the possible source of parasitical water within the sewer, and drinking water discharged into the sewer system. This method was applied at an urban catchment scale in order to test the effective water-tightness of two different sewer networks. The sampling strategy was based on an uncertainty analysis and the errors have been propagated using Monte Carlo random sampling. Our field applications showed that the method can be applied easily and quickly, but the error in the estimated infiltration rate can be up to 20%. The estimated infiltration into the recent sewer in Torraccia is 14% and can be considered negligible given the precision of the method, while the old sewer in Infernetto has an estimated infiltration of 50%.
Inverse sequential detection of parameter changes in developing time series

NASA Technical Reports Server (NTRS)

Radok, Uwe; Brown, Timothy J.

1992-01-01

Progressive values of two probabilities are obtained for parameter estimates derived from an existing set of values and from the same set enlarged by one or more new values, respectively. One probability is that of erroneously preferring the second of these estimates for the existing data ('type 1 error'), while the second probability is that of erroneously accepting their estimates for the enlarged test ('type 2 error'). A more stable combined 'no change' probability which always falls between 0.5 and 0 is derived from the (logarithmic) width of the uncertainty region of an equivalent 'inverted' sequential probability ratio test (SPRT, Wald 1945) in which the error probabilities are calculated rather than prescribed. A parameter change is indicated when the compound probability undergoes a progressive decrease. The test is explicitly formulated and exemplified for Gaussian samples.
Orbit/attitude estimation with LANDSAT Landmark data

NASA Technical Reports Server (NTRS)

Hall, D. L.; Waligora, S.

1979-01-01

The use of LANDSAT landmark data for orbit/attitude and camera bias estimation was studied. The preliminary results of these investigations are presented. The Goddard Trajectory Determination System (GTDS) error analysis capability was used to perform error analysis studies. A number of questions were addressed including parameter observability and sensitivity, effects on the solve-for parameter errors of data span, density, and distribution an a priori covariance weighting. The use of the GTDS differential correction capability with acutal landmark data was examined. The rms line and element observation residuals were studied as a function of the solve-for parameter set, a priori covariance weighting, force model, attitude model and data characteristics. Sample results are presented. Finally, verfication and preliminary system evaluation of the LANDSAT NAVPAK system for sequential (extended Kalman Filter) estimation of orbit, and camera bias parameters is given.
Comparing and Combining Data across Multiple Sources via Integration of Paired-sample Data to Correct for Measurement Error

PubMed Central

Huang, Yunda; Huang, Ying; Moodie, Zoe; Li, Sue; Self, Steve

2014-01-01

Summary In biomedical research such as the development of vaccines for infectious diseases or cancer, measures from the same assay are often collected from multiple sources or laboratories. Measurement error that may vary between laboratories needs to be adjusted for when combining samples across laboratories. We incorporate such adjustment in comparing and combining independent samples from different labs via integration of external data, collected on paired samples from the same two laboratories. We propose: 1) normalization of individual level data from two laboratories to the same scale via the expectation of true measurements conditioning on the observed; 2) comparison of mean assay values between two independent samples in the Main study accounting for inter-source measurement error; and 3) sample size calculations of the paired-sample study so that hypothesis testing error rates are appropriately controlled in the Main study comparison. Because the goal is not to estimate the true underlying measurements but to combine data on the same scale, our proposed methods do not require that the true values for the errorprone measurements are known in the external data. Simulation results under a variety of scenarios demonstrate satisfactory finite sample performance of our proposed methods when measurement errors vary. We illustrate our methods using real ELISpot assay data generated by two HIV vaccine laboratories. PMID:22764070

Grizzly Bear Noninvasive Genetic Tagging Surveys: Estimating the Magnitude of Missed Detections.

PubMed

Fisher, Jason T; Heim, Nicole; Code, Sandra; Paczkowski, John

2016-01-01

Sound wildlife conservation decisions require sound information, and scientists increasingly rely on remotely collected data over large spatial scales, such as noninvasive genetic tagging (NGT). Grizzly bears (Ursus arctos), for example, are difficult to study at population scales except with noninvasive data, and NGT via hair trapping informs management over much of grizzly bears' range. Considerable statistical effort has gone into estimating sources of heterogeneity, but detection error-arising when a visiting bear fails to leave a hair sample-has not been independently estimated. We used camera traps to survey grizzly bear occurrence at fixed hair traps and multi-method hierarchical occupancy models to estimate the probability that a visiting bear actually leaves a hair sample with viable DNA. We surveyed grizzly bears via hair trapping and camera trapping for 8 monthly surveys at 50 (2012) and 76 (2013) sites in the Rocky Mountains of Alberta, Canada. We used multi-method occupancy models to estimate site occupancy, probability of detection, and conditional occupancy at a hair trap. We tested the prediction that detection error in NGT studies could be induced by temporal variability within season, leading to underestimation of occupancy. NGT via hair trapping consistently underestimated grizzly bear occupancy at a site when compared to camera trapping. At best occupancy was underestimated by 50%; at worst, by 95%. Probability of false absence was reduced through successive surveys, but this mainly accounts for error imparted by movement among repeated surveys, not necessarily missed detections by extant bears. The implications of missed detections and biased occupancy estimates for density estimation-which form the crux of management plans-require consideration. We suggest hair-trap NGT studies should estimate and correct detection error using independent survey methods such as cameras, to ensure the reliability of the data upon which species management and conservation actions are based.
Single point estimation of phenytoin dosing: a reappraisal.

PubMed

Koup, J R; Gibaldi, M; Godolphin, W

1981-11-01

A previously proposed method for estimation of phenytoin dosing requirement using a single serum sample obtained 24 hours after intravenous loading dose (18 mg/Kg) has been re-evaluated. Using more realistic values for the volume of distribution of phenytoin (0.4 to 1.2 L/Kg), simulations indicate that the proposed method will fail to consistently predict dosage requirements. Additional simulations indicate that two samples obtained during the 24 hour interval following the iv loading dose could be used to more reliably predict phenytoin dose requirement. Because of the nonlinear relationship which exists between phenytoin dose administration rate (RO) and the mean steady state serum concentration (CSS), small errors in prediction of the required RO result in much larger errors in CSS.
Evaluation and optimization of sampling errors for the Monte Carlo Independent Column Approximation

NASA Astrophysics Data System (ADS)

Räisänen, Petri; Barker, W. Howard

2004-07-01

The Monte Carlo Independent Column Approximation (McICA) method for computing domain-average broadband radiative fluxes is unbiased with respect to the full ICA, but its flux estimates contain conditional random noise. McICA's sampling errors are evaluated here using a global climate model (GCM) dataset and a correlated-k distribution (CKD) radiation scheme. Two approaches to reduce McICA's sampling variance are discussed. The first is to simply restrict all of McICA's samples to cloudy regions. This avoids wasting precious few samples on essentially homogeneous clear skies. Clear-sky fluxes need to be computed separately for this approach, but this is usually done in GCMs for diagnostic purposes anyway. Second, accuracy can be improved by repeated sampling, and averaging those CKD terms with large cloud radiative effects. Although this naturally increases computational costs over the standard CKD model, random errors for fluxes and heating rates are reduced by typically 50% to 60%, for the present radiation code, when the total number of samples is increased by 50%. When both variance reduction techniques are applied simultaneously, globally averaged flux and heating rate random errors are reduced by a factor of #3.
Rank score and permutation testing alternatives for regression quantile estimates

USGS Publications Warehouse

Cade, B.S.; Richards, J.D.; Mielke, P.W.

2006-01-01

Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application relating trout densities to stream channel width:depth.
Apparent polyploidization after gamma irradiation: pitfalls in the use of quantitative polymerase chain reaction (qPCR) for the estimation of mitochondrial and nuclear DNA gene copy numbers.

PubMed

Kam, Winnie W Y; Lake, Vanessa; Banos, Connie; Davies, Justin; Banati, Richard

2013-05-30

Quantitative polymerase chain reaction (qPCR) has been widely used to quantify changes in gene copy numbers after radiation exposure. Here, we show that gamma irradiation ranging from 10 to 100 Gy of cells and cell-free DNA samples significantly affects the measured qPCR yield, due to radiation-induced fragmentation of the DNA template and, therefore, introduces errors into the estimation of gene copy numbers. The radiation-induced DNA fragmentation and, thus, measured qPCR yield varies with temperature not only in living cells, but also in isolated DNA irradiated under cell-free conditions. In summary, the variability in measured qPCR yield from irradiated samples introduces a significant error into the estimation of both mitochondrial and nuclear gene copy numbers and may give spurious evidence for polyploidization.
Irrigated lands assessment for water management: Technique test. [California

NASA Technical Reports Server (NTRS)

Wall, S. L.; Brown, C. E.; Eriksson, M.; Grigg, C. A.; Thomas, R. W.; Colwell, R. N.; Estes, J. E.; Tinney, L. R.; Baggett, J. O.; Sawyer, G.

1981-01-01

A procedure for estimating irrigated land using full frame LANDSAT imagery was demonstrated. Relatively inexpensive interpretation of multidate LANDSAT photographic enlargements was used to produce a map of irrigated land in California. The LANDSAT and ground maps were then linked by regression equations to enable precise estimation of irrigated land area by county, basin, and statewide. Land irrigated at least once in California in 1979 was estimated to be 9.86 million acres, with an expected error of less than 1.75% at the 99% level of confidence. To achieve the same level of error with a ground-only sample would have required 3 to 5 times as many ground sample units statewide. A procedure for relatively inexpensive computer classification of LANDSAT digital data to irrigated land categories was also developed. This procedure is based on ratios of MSS band 7 and 5, and gave good results for several counties in the Central Valley.
Regression dilution in the proportional hazards model.

PubMed

Hughes, M D

1993-12-01

The problem of regression dilution arising from covariate measurement error is investigated for survival data using the proportional hazards model. The naive approach to parameter estimation is considered whereby observed covariate values are used, inappropriately, in the usual analysis instead of the underlying covariate values. A relationship between the estimated parameter in large samples and the true parameter is obtained showing that the bias does not depend on the form of the baseline hazard function when the errors are normally distributed. With high censorship, adjustment of the naive estimate by the factor 1 + lambda, where lambda is the ratio of within-person variability about an underlying mean level to the variability of these levels in the population sampled, removes the bias. As censorship increases, the adjustment required increases and when there is no censorship is markedly higher than 1 + lambda and depends also on the true risk relationship.
Detection of the pairwise kinematic Sunyaev-Zel'dovich effect with BOSS DR11 and the Atacama Cosmology Telescope

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernardis, F. De; Aiola, S.; Vavagiakis, E. M.

Here, we present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariancemore » matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.« less
Detection of the pairwise kinematic Sunyaev-Zel'dovich effect with BOSS DR11 and the Atacama Cosmology Telescope

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernardis, F. De; Vavagiakis, E.M.; Niemack, M.D.

We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrixmore » of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.« less
Detection of the Pairwise Kinematic Sunyaev-Zel'dovich Effect with BOSS DR11 and the Atacama Cosmology Telescope

NASA Technical Reports Server (NTRS)

De Bernardis, F.; Aiola, S.; Vavagiakis, E. M.; Battaglia, N.; Niemack, M. D.; Beall, J.; Becker, D. T.; Bond, J. R.; Calabrese, E.; Cho, H.;

2017-01-01

We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.

Detection of the pairwise kinematic Sunyaev-Zel'dovich effect with BOSS DR11 and the Atacama Cosmology Telescope

NASA Astrophysics Data System (ADS)

De Bernardis, F.; Aiola, S.; Vavagiakis, E. M.; Battaglia, N.; Niemack, M. D.; Beall, J.; Becker, D. T.; Bond, J. R.; Calabrese, E.; Cho, H.; Coughlin, K.; Datta, R.; Devlin, M.; Dunkley, J.; Dunner, R.; Ferraro, S.; Fox, A.; Gallardo, P. A.; Halpern, M.; Hand, N.; Hasselfield, M.; Henderson, S. W.; Hill, J. C.; Hilton, G. C.; Hilton, M.; Hincks, A. D.; Hlozek, R.; Hubmayr, J.; Huffenberger, K.; Hughes, J. P.; Irwin, K. D.; Koopman, B. J.; Kosowsky, A.; Li, D.; Louis, T.; Lungu, M.; Madhavacheril, M. S.; Maurin, L.; McMahon, J.; Moodley, K.; Naess, S.; Nati, F.; Newburgh, L.; Nibarger, J. P.; Page, L. A.; Partridge, B.; Schaan, E.; Schmitt, B. L.; Sehgal, N.; Sievers, J.; Simon, S. M.; Spergel, D. N.; Staggs, S. T.; Stevens, J. R.; Thornton, R. J.; van Engelen, A.; Van Lanen, J.; Wollack, E. J.

2017-03-01

We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.
Age estimation from dental cementum incremental lines and periodontal disease.

PubMed

Dias, P E M; Beaini, T L; Melani, R F H

2010-12-01

Age estimation by counting incremental lines in cementum added to the average age of tooth eruption is considered an accurate method by some authors, while others reject it stating weak correlation between estimated and actual age. The aim of this study was to evaluate this technique and check the influence of periodontal disease on age estimates by analyzing both the number of cementum lines and the correlation between cementum thickness and actual age on freshly extracted teeth. Thirty one undecalcified ground cross sections of approximately 30 µm, from 25 teeth were prepared, observed, photographed and measured. Images were enhanced by software and counts were made by one observer, and the results compared with two control-observers. There was moderate correlation ((r)=0.58) for the entire sample, with mean error of 9.7 years. For teeth with periodontal pathologies, correlation was 0.03 with a mean error of 22.6 years. For teeth without periodontal pathologies, correlation was 0.74 with mean error of 1.6 years. There was correlation of 0.69 between cementum thickness and known age for the entire sample, 0.25 for teeth with periodontal problems and 0.75 for teeth without periodontal pathologies. The technique was reliable for periodontally sound teeth, but not for periodontally diseased teeth.
Detection of the pairwise kinematic Sunyaev-Zel'dovich effect with BOSS DR11 and the Atacama Cosmology Telescope

DOE PAGES

Bernardis, F. De; Aiola, S.; Vavagiakis, E. M.; ...

2017-03-07

Here, we present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariancemore » matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.« less
Estimating true human and animal host source contribution in quantitative microbial source tracking using the Monte Carlo method.

PubMed

Wang, Dan; Silkie, Sarah S; Nelson, Kara L; Wuertz, Stefan

2010-09-01

Cultivation- and library-independent, quantitative PCR-based methods have become the method of choice in microbial source tracking. However, these qPCR assays are not 100% specific and sensitive for the target sequence in their respective hosts' genome. The factors that can lead to false positive and false negative information in qPCR results are well defined. It is highly desirable to have a way of removing such false information to estimate the true concentration of host-specific genetic markers and help guide the interpretation of environmental monitoring studies. Here we propose a statistical model based on the Law of Total Probability to predict the true concentration of these markers. The distributions of the probabilities of obtaining false information are estimated from representative fecal samples of known origin. Measurement error is derived from the sample precision error of replicated qPCR reactions. Then, the Monte Carlo method is applied to sample from these distributions of probabilities and measurement error. The set of equations given by the Law of Total Probability allows one to calculate the distribution of true concentrations, from which their expected value, confidence interval and other statistical characteristics can be easily evaluated. The output distributions of predicted true concentrations can then be used as input to watershed-wide total maximum daily load determinations, quantitative microbial risk assessment and other environmental models. This model was validated by both statistical simulations and real world samples. It was able to correct the intrinsic false information associated with qPCR assays and output the distribution of true concentrations of Bacteroidales for each animal host group. Model performance was strongly affected by the precision error. It could perform reliably and precisely when the standard deviation of the precision error was small (≤ 0.1). Further improvement on the precision of sample processing and qPCR reaction would greatly improve the performance of the model. This methodology, built upon Bacteroidales assays, is readily transferable to any other microbial source indicator where a universal assay for fecal sources of that indicator exists. Copyright © 2010 Elsevier Ltd. All rights reserved.
A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.

PubMed

Khondoker, Mizanur; Dobson, Richard; Skirrow, Caroline; Simmons, Andrew; Stahl, Daniel

2016-10-01

Recent literature on the comparison of machine learning methods has raised questions about the neutrality, unbiasedness and utility of many comparative studies. Reporting of results on favourable datasets and sampling error in the estimated performance measures based on single samples are thought to be the major sources of bias in such comparisons. Better performance in one or a few instances does not necessarily imply so on an average or on a population level and simulation studies may be a better alternative for objectively comparing the performances of machine learning algorithms. We compare the classification performance of a number of important and widely used machine learning algorithms, namely the Random Forests (RF), Support Vector Machines (SVM), Linear Discriminant Analysis (LDA) and k-Nearest Neighbour (kNN). Using massively parallel processing on high-performance supercomputers, we compare the generalisation errors at various combinations of levels of several factors: number of features, training sample size, biological variation, experimental variation, effect size, replication and correlation between features. For smaller number of correlated features, number of features not exceeding approximately half the sample size, LDA was found to be the method of choice in terms of average generalisation errors as well as stability (precision) of error estimates. SVM (with RBF kernel) outperforms LDA as well as RF and kNN by a clear margin as the feature set gets larger provided the sample size is not too small (at least 20). The performance of kNN also improves as the number of features grows and outplays that of LDA and RF unless the data variability is too high and/or effect sizes are too small. RF was found to outperform only kNN in some instances where the data are more variable and have smaller effect sizes, in which cases it also provide more stable error estimates than kNN and LDA. Applications to a number of real datasets supported the findings from the simulation study. © The Author(s) 2013.
A Note on Sample Size and Solution Propriety for Confirmatory Factor Analytic Models

ERIC Educational Resources Information Center

Jackson, Dennis L.; Voth, Jennifer; Frey, Marc P.

2013-01-01

Determining an appropriate sample size for use in latent variable modeling techniques has presented ongoing challenges to researchers. In particular, small sample sizes are known to present concerns over sampling error for the variances and covariances on which model estimation is based, as well as for fit indexes and convergence failures. The…
Body mass and stature estimation based on the first metatarsal in humans.

PubMed

De Groote, Isabelle; Humphrey, Louise T

2011-04-01

Archaeological assemblages often lack the complete long bones needed to estimate stature and body mass. The most accurate estimates of body mass and stature are produced using femoral head diameter and femur length. Foot bones including the first metatarsal preserve relatively well in a range of archaeological contexts. In this article we present regression equations using the first metatarsal to estimate femoral head diameter, femoral length, and body mass in a diverse human sample. The skeletal sample comprised 87 individuals (Andamanese, Australasians, Africans, Native Americans, and British). Results show that all first metatarsal measurements correlate moderately to highly (r = 0.62-0.91) with femoral head diameter and length. The proximal articular dorsoplantar diameter is the best single measurement to predict both femoral dimensions. Percent standard errors of the estimate are below 5%. Equations using two metatarsal measurements show a small increase in accuracy. Direct estimations of body mass (calculated from measured femoral head diameter using previously published equations) have an error of just over 7%. No direct stature estimation equations were derived due to the varied linear body proportions represented in the sample. The equations were tested on a sample of 35 individuals from Christ Church Spitalfields. Percentage differences in estimated and measured femoral head diameter and length were less than 1%. This study demonstrates that it is feasible to use the first metatarsal in the estimation of body mass and stature. The equations presented here are particularly useful for assemblages where the long bones are either missing or fragmented, and enable estimation of these fundamental population parameters in poorly preserved assemblages. Copyright © 2011 Wiley-Liss, Inc.
Effect of sample inhomogeneity in KAr dating

USGS Publications Warehouse

Engels, J.C.; Ingamells, C.O.

1970-01-01

Error in K-Ar ages is often due more to deficiencies in the splitting process, whereby portions of the sample are taken for potassium and for argon determination, than to imprecision in the analytical methods. The effect of the grain size of a sample and of the composition of a contaminating mineral can be evaluated, and this provides a useful guide in attempts to minimize error. Rocks and minerals should be prepared for age determination with the effects of contaminants and grain size in mind. The magnitude of such effects can be much larger than intuitive estimates might indicate. ?? 1970.
New features added to EVALIDator: ratio estimation and county choropleth maps

Treesearch

Patrick D. Miles; Mark H. Hansen

2012-01-01

The EVALIDator Web application, developed in 2007, provides estimates and sampling errors for many user selected forest statistics from the Forest Inventory and Analysis Database (FIADB). Among the statistics estimated are forest area, number of trees, biomass, volume, growth, removals, and mortality. A new release of EVALIDator, developed in 2012, has an option to...
Estimation and correction of visibility bias in aerial surveys of wintering ducks

USGS Publications Warehouse

Pearse, A.T.; Gerard, P.D.; Dinsmore, S.J.; Kaminski, R.M.; Reinecke, K.J.

2008-01-01

Incomplete detection of all individuals leading to negative bias in abundance estimates is a pervasive source of error in aerial surveys of wildlife, and correcting that bias is a critical step in improving surveys. We conducted experiments using duck decoys as surrogates for live ducks to estimate bias associated with surveys of wintering ducks in Mississippi, USA. We found detection of decoy groups was related to wetland cover type (open vs. forested), group size (1?100 decoys), and interaction of these variables. Observers who detected decoy groups reported counts that averaged 78% of the decoys actually present, and this counting bias was not influenced by either covariate cited above. We integrated this sightability model into estimation procedures for our sample surveys with weight adjustments derived from probabilities of group detection (estimated by logistic regression) and count bias. To estimate variances of abundance estimates, we used bootstrap resampling of transects included in aerial surveys and data from the bias-correction experiment. When we implemented bias correction procedures on data from a field survey conducted in January 2004, we found bias-corrected estimates of abundance increased 36?42%, and associated standard errors increased 38?55%, depending on species or group estimated. We deemed our method successful for integrating correction of visibility bias in an existing sample survey design for wintering ducks in Mississippi, and we believe this procedure could be implemented in a variety of sampling problems for other locations and species.

Survival analysis with error-prone time-varying covariates: a risk set calibration approach

PubMed Central

Liao, Xiaomei; Zucker, David M.; Li, Yi; Spiegelman, Donna

2010-01-01

Summary Occupational, environmental, and nutritional epidemiologists are often interested in estimating the prospective effect of time-varying exposure variables such as cumulative exposure or cumulative updated average exposure, in relation to chronic disease endpoints such as cancer incidence and mortality. From exposure validation studies, it is apparent that many of the variables of interest are measured with moderate to substantial error. Although the ordinary regression calibration approach is approximately valid and efficient for measurement error correction of relative risk estimates from the Cox model with time-independent point exposures when the disease is rare, it is not adaptable for use with time-varying exposures. By re-calibrating the measurement error model within each risk set, a risk set regression calibration method is proposed for this setting. An algorithm for a bias-corrected point estimate of the relative risk using an RRC approach is presented, followed by the derivation of an estimate of its variance, resulting in a sandwich estimator. Emphasis is on methods applicable to the main study/external validation study design, which arises in important applications. Simulation studies under several assumptions about the error model were carried out, which demonstrated the validity and efficiency of the method in finite samples. The method was applied to a study of diet and cancer from Harvard’s Health Professionals Follow-up Study (HPFS). PMID:20486928
Effects of categorization method, regression type, and variable distribution on the inflation of Type-I error rate when categorizing a confounding variable.

PubMed

Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A

2015-03-15

The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.
Galaxy–galaxy lensing estimators and their covariance properties

DOE PAGES

Singh, Sukhdeep; Mandelbaum, Rachel; Seljak, Uros; ...

2017-07-21

Here, we study the covariance properties of real space correlation function estimators – primarily galaxy–shear correlations, or galaxy–galaxy lensing – using SDSS data for both shear catalogues and lenses (specifically the BOSS LOWZ sample). Using mock catalogues of lenses and sources, we disentangle the various contributions to the covariance matrix and compare them with a simple analytical model. We show that not subtracting the lensing measurement around random points from the measurement around the lens sample is equivalent to performing the measurement using the lens density field instead of the lens overdensity field. While the measurement using the lens densitymore » field is unbiased (in the absence of systematics), its error is significantly larger due to an additional term in the covariance. Therefore, this subtraction should be performed regardless of its beneficial effects on systematics. Comparing the error estimates from data and mocks for estimators that involve the overdensity, we find that the errors are dominated by the shape noise and lens clustering, which empirically estimated covariances (jackknife and standard deviation across mocks) that are consistent with theoretical estimates, and that both the connected parts of the four-point function and the supersample covariance can be neglected for the current levels of noise. While the trade-off between different terms in the covariance depends on the survey configuration (area, source number density), the diagnostics that we use in this work should be useful for future works to test their empirically determined covariances.« less
Galaxy–galaxy lensing estimators and their covariance properties

DOE Office of Scientific and Technical Information (OSTI.GOV)

Singh, Sukhdeep; Mandelbaum, Rachel; Seljak, Uros

Here, we study the covariance properties of real space correlation function estimators – primarily galaxy–shear correlations, or galaxy–galaxy lensing – using SDSS data for both shear catalogues and lenses (specifically the BOSS LOWZ sample). Using mock catalogues of lenses and sources, we disentangle the various contributions to the covariance matrix and compare them with a simple analytical model. We show that not subtracting the lensing measurement around random points from the measurement around the lens sample is equivalent to performing the measurement using the lens density field instead of the lens overdensity field. While the measurement using the lens densitymore » field is unbiased (in the absence of systematics), its error is significantly larger due to an additional term in the covariance. Therefore, this subtraction should be performed regardless of its beneficial effects on systematics. Comparing the error estimates from data and mocks for estimators that involve the overdensity, we find that the errors are dominated by the shape noise and lens clustering, which empirically estimated covariances (jackknife and standard deviation across mocks) that are consistent with theoretical estimates, and that both the connected parts of the four-point function and the supersample covariance can be neglected for the current levels of noise. While the trade-off between different terms in the covariance depends on the survey configuration (area, source number density), the diagnostics that we use in this work should be useful for future works to test their empirically determined covariances.« less
Galaxy-galaxy lensing estimators and their covariance properties

NASA Astrophysics Data System (ADS)

Singh, Sukhdeep; Mandelbaum, Rachel; Seljak, Uroš; Slosar, Anže; Vazquez Gonzalez, Jose

2017-11-01

We study the covariance properties of real space correlation function estimators - primarily galaxy-shear correlations, or galaxy-galaxy lensing - using SDSS data for both shear catalogues and lenses (specifically the BOSS LOWZ sample). Using mock catalogues of lenses and sources, we disentangle the various contributions to the covariance matrix and compare them with a simple analytical model. We show that not subtracting the lensing measurement around random points from the measurement around the lens sample is equivalent to performing the measurement using the lens density field instead of the lens overdensity field. While the measurement using the lens density field is unbiased (in the absence of systematics), its error is significantly larger due to an additional term in the covariance. Therefore, this subtraction should be performed regardless of its beneficial effects on systematics. Comparing the error estimates from data and mocks for estimators that involve the overdensity, we find that the errors are dominated by the shape noise and lens clustering, which empirically estimated covariances (jackknife and standard deviation across mocks) that are consistent with theoretical estimates, and that both the connected parts of the four-point function and the supersample covariance can be neglected for the current levels of noise. While the trade-off between different terms in the covariance depends on the survey configuration (area, source number density), the diagnostics that we use in this work should be useful for future works to test their empirically determined covariances.
Improved estimates of ocean heat content from 1960 to 2015.

PubMed

Cheng, Lijing; Trenberth, Kevin E; Fasullo, John; Boyer, Tim; Abraham, John; Zhu, Jiang

2017-03-01

Earth's energy imbalance (EEI) drives the ongoing global warming and can best be assessed across the historical record (that is, since 1960) from ocean heat content (OHC) changes. An accurate assessment of OHC is a challenge, mainly because of insufficient and irregular data coverage. We provide updated OHC estimates with the goal of minimizing associated sampling error. We performed a subsample test, in which subsets of data during the data-rich Argo era are colocated with locations of earlier ocean observations, to quantify this error. Our results provide a new OHC estimate with an unbiased mean sampling error and with variability on decadal and multidecadal time scales (signal) that can be reliably distinguished from sampling error (noise) with signal-to-noise ratios higher than 3. The inferred integrated EEI is greater than that reported in previous assessments and is consistent with a reconstruction of the radiative imbalance at the top of atmosphere starting in 1985. We found that changes in OHC are relatively small before about 1980; since then, OHC has increased fairly steadily and, since 1990, has increasingly involved deeper layers of the ocean. In addition, OHC changes in six major oceans are reliable on decadal time scales. All ocean basins examined have experienced significant warming since 1998, with the greatest warming in the southern oceans, the tropical/subtropical Pacific Ocean, and the tropical/subtropical Atlantic Ocean. This new look at OHC and EEI changes over time provides greater confidence than previously possible, and the data sets produced are a valuable resource for further study.
Improved estimates of ocean heat content from 1960 to 2015

PubMed Central

Cheng, Lijing; Trenberth, Kevin E.; Fasullo, John; Boyer, Tim; Abraham, John; Zhu, Jiang

2017-01-01

Earth’s energy imbalance (EEI) drives the ongoing global warming and can best be assessed across the historical record (that is, since 1960) from ocean heat content (OHC) changes. An accurate assessment of OHC is a challenge, mainly because of insufficient and irregular data coverage. We provide updated OHC estimates with the goal of minimizing associated sampling error. We performed a subsample test, in which subsets of data during the data-rich Argo era are colocated with locations of earlier ocean observations, to quantify this error. Our results provide a new OHC estimate with an unbiased mean sampling error and with variability on decadal and multidecadal time scales (signal) that can be reliably distinguished from sampling error (noise) with signal-to-noise ratios higher than 3. The inferred integrated EEI is greater than that reported in previous assessments and is consistent with a reconstruction of the radiative imbalance at the top of atmosphere starting in 1985. We found that changes in OHC are relatively small before about 1980; since then, OHC has increased fairly steadily and, since 1990, has increasingly involved deeper layers of the ocean. In addition, OHC changes in six major oceans are reliable on decadal time scales. All ocean basins examined have experienced significant warming since 1998, with the greatest warming in the southern oceans, the tropical/subtropical Pacific Ocean, and the tropical/subtropical Atlantic Ocean. This new look at OHC and EEI changes over time provides greater confidence than previously possible, and the data sets produced are a valuable resource for further study. PMID:28345033
Comparison of bias-corrected covariance estimators for MMRM analysis in longitudinal data with dropouts.

PubMed

Gosho, Masahiko; Hirakawa, Akihiro; Noma, Hisashi; Maruo, Kazushi; Sato, Yasunori

2017-10-01

In longitudinal clinical trials, some subjects will drop out before completing the trial, so their measurements towards the end of the trial are not obtained. Mixed-effects models for repeated measures (MMRM) analysis with "unstructured" (UN) covariance structure are increasingly common as a primary analysis for group comparisons in these trials. Furthermore, model-based covariance estimators have been routinely used for testing the group difference and estimating confidence intervals of the difference in the MMRM analysis using the UN covariance. However, using the MMRM analysis with the UN covariance could lead to convergence problems for numerical optimization, especially in trials with a small-sample size. Although the so-called sandwich covariance estimator is robust to misspecification of the covariance structure, its performance deteriorates in settings with small-sample size. We investigated the performance of the sandwich covariance estimator and covariance estimators adjusted for small-sample bias proposed by Kauermann and Carroll ( J Am Stat Assoc 2001; 96: 1387-1396) and Mancl and DeRouen ( Biometrics 2001; 57: 126-134) fitting simpler covariance structures through a simulation study. In terms of the type 1 error rate and coverage probability of confidence intervals, Mancl and DeRouen's covariance estimator with compound symmetry, first-order autoregressive (AR(1)), heterogeneous AR(1), and antedependence structures performed better than the original sandwich estimator and Kauermann and Carroll's estimator with these structures in the scenarios where the variance increased across visits. The performance based on Mancl and DeRouen's estimator with these structures was nearly equivalent to that based on the Kenward-Roger method for adjusting the standard errors and degrees of freedom with the UN structure. The model-based covariance estimator with the UN structure under unadjustment of the degrees of freedom, which is frequently used in applications, resulted in substantial inflation of the type 1 error rate. We recommend the use of Mancl and DeRouen's estimator in MMRM analysis if the number of subjects completing is ( n + 5) or less, where n is the number of planned visits. Otherwise, the use of Kenward and Roger's method with UN structure should be the best way.
Accuracy of travel time distribution (TTD) models as affected by TTD complexity, observation errors, and model and tracer selection

USGS Publications Warehouse

Green, Christopher T.; Zhang, Yong; Jurgens, Bryant C.; Starn, J. Jeffrey; Landon, Matthew K.

2014-01-01

Analytical models of the travel time distribution (TTD) from a source area to a sample location are often used to estimate groundwater ages and solute concentration trends. The accuracies of these models are not well known for geologically complex aquifers. In this study, synthetic datasets were used to quantify the accuracy of four analytical TTD models as affected by TTD complexity, observation errors, model selection, and tracer selection. Synthetic TTDs and tracer data were generated from existing numerical models with complex hydrofacies distributions for one public-supply well and 14 monitoring wells in the Central Valley, California. Analytical TTD models were calibrated to synthetic tracer data, and prediction errors were determined for estimates of TTDs and conservative tracer (NO3−) concentrations. Analytical models included a new, scale-dependent dispersivity model (SDM) for two-dimensional transport from the watertable to a well, and three other established analytical models. The relative influence of the error sources (TTD complexity, observation error, model selection, and tracer selection) depended on the type of prediction. Geological complexity gave rise to complex TTDs in monitoring wells that strongly affected errors of the estimated TTDs. However, prediction errors for NO3− and median age depended more on tracer concentration errors. The SDM tended to give the most accurate estimates of the vertical velocity and other predictions, although TTD model selection had minor effects overall. Adding tracers improved predictions if the new tracers had different input histories. Studies using TTD models should focus on the factors that most strongly affect the desired predictions.
Honest Importance Sampling with Multiple Markov Chains

PubMed Central

Tan, Aixin; Doss, Hani; Hobert, James P.

2017-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.

PubMed

Tan, Aixin; Doss, Hani; Hobert, James P

2015-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
Estimating accuracy of land-cover composition from two-stage cluster sampling

USGS Publications Warehouse

Stehman, S.V.; Wickham, J.D.; Fattorini, L.; Wade, T.D.; Baffetta, F.; Smith, J.H.

2009-01-01

Land-cover maps are often used to compute land-cover composition (i.e., the proportion or percent of area covered by each class), for each unit in a spatial partition of the region mapped. We derive design-based estimators of mean deviation (MD), mean absolute deviation (MAD), root mean square error (RMSE), and correlation (CORR) to quantify accuracy of land-cover composition for a general two-stage cluster sampling design, and for the special case of simple random sampling without replacement (SRSWOR) at each stage. The bias of the estimators for the two-stage SRSWOR design is evaluated via a simulation study. The estimators of RMSE and CORR have small bias except when sample size is small and the land-cover class is rare. The estimator of MAD is biased for both rare and common land-cover classes except when sample size is large. A general recommendation is that rare land-cover classes require large sample sizes to ensure that the accuracy estimators have small bias. ?? 2009 Elsevier Inc.
Measurement uncertainty and feasibility study of a flush airdata system for a hypersonic flight experiment

NASA Technical Reports Server (NTRS)

Whitmore, Stephen A.; Moes, Timothy R.

1994-01-01

Presented is a feasibility and error analysis for a hypersonic flush airdata system on a hypersonic flight experiment (HYFLITE). HYFLITE heating loads make intrusive airdata measurement impractical. Although this analysis is specifically for the HYFLITE vehicle and trajectory, the problems analyzed are generally applicable to hypersonic vehicles. A layout of the flush-port matrix is shown. Surface pressures are related airdata parameters using a simple aerodynamic model. The model is linearized using small perturbations and inverted using nonlinear least-squares. Effects of various error sources on the overall uncertainty are evaluated using an error simulation. Error sources modeled include boundarylayer/viscous interactions, pneumatic lag, thermal transpiration in the sensor pressure tubing, misalignment in the matrix layout, thermal warping of the vehicle nose, sampling resolution, and transducer error. Using simulated pressure data for input to the estimation algorithm, effects caused by various error sources are analyzed by comparing estimator outputs with the original trajectory. To obtain ensemble averages the simulation is run repeatedly and output statistics are compiled. Output errors resulting from the various error sources are presented as a function of Mach number. Final uncertainties with all modeled error sources included are presented as a function of Mach number.
LACIE performance predictor final operational capability program description, volume 3

NASA Technical Reports Server (NTRS)

1976-01-01

The requirements and processing logic for the LACIE Error Model program (LEM) are described. This program is an integral part of the Large Area Crop Inventory Experiment (LACIE) system. LEM is that portion of the LPP (LACIE Performance Predictor) which simulates the sample segment classification, strata yield estimation, and production aggregation. LEM controls repetitive Monte Carlo trials based on input error distributions to obtain statistical estimates of the wheat area, yield, and production at different levels of aggregation. LEM interfaces with the rest of the LPP through a set of data files.
Sampling procedures for throughfall monitoring: A simulation study

NASA Astrophysics Data System (ADS)

Zimmermann, Beate; Zimmermann, Alexander; Lark, Richard Murray; Elsenbeer, Helmut

2010-01-01

What is the most appropriate sampling scheme to estimate event-based average throughfall? A satisfactory answer to this seemingly simple question has yet to be found, a failure which we attribute to previous efforts' dependence on empirical studies. Here we try to answer this question by simulating stochastic throughfall fields based on parameters for statistical models of large monitoring data sets. We subsequently sampled these fields with different sampling designs and variable sample supports. We evaluated the performance of a particular sampling scheme with respect to the uncertainty of possible estimated means of throughfall volumes. Even for a relative error limit of 20%, an impractically large number of small, funnel-type collectors would be required to estimate mean throughfall, particularly for small events. While stratification of the target area is not superior to simple random sampling, cluster random sampling involves the risk of being less efficient. A larger sample support, e.g., the use of trough-type collectors, considerably reduces the necessary sample sizes and eliminates the sensitivity of the mean to outliers. Since the gain in time associated with the manual handling of troughs versus funnels depends on the local precipitation regime, the employment of automatically recording clusters of long troughs emerges as the most promising sampling scheme. Even so, a relative error of less than 5% appears out of reach for throughfall under heterogeneous canopies. We therefore suspect a considerable uncertainty of input parameters for interception models derived from measured throughfall, in particular, for those requiring data of small throughfall events.
The Surface Water and Ocean Topography Satellite Mission - An Assessment of Swath Altimetry Measurements of River Hydrodynamics

NASA Technical Reports Server (NTRS)

Wilson, Matthew D.; Durand, Michael; Alsdorf, Douglas; Chul-Jung, Hahn; Andreadis, Konstantinos M.; Lee, Hyongki

2012-01-01

The Surface Water and Ocean Topography (SWOT) satellite mission, scheduled for launch in 2020 with development commencing in 2015, will provide a step-change improvement in the measurement of terrestrial surface water storage and dynamics. In particular, it will provide the first, routine two-dimensional measurements of water surface elevations, which will allow for the estimation of river and floodplain flows via the water surface slope. In this paper, we characterize the measurements which may be obtained from SWOT and illustrate how they may be used to derive estimates of river discharge. In particular, we show (i) the spatia-temporal sampling scheme of SWOT, (ii) the errors which maybe expected in swath altimetry measurements of the terrestrial surface water, and (iii) the impacts such errors may have on estimates of water surface slope and river discharge, We illustrate this through a "virtual mission" study for a approximately 300 km reach of the central Amazon river, using a hydraulic model to provide water surface elevations according to the SWOT spatia-temporal sampling scheme (orbit with 78 degree inclination, 22 day repeat and 140 km swath width) to which errors were added based on a two-dimension height error spectrum derived from the SWOT design requirements. Water surface elevation measurements for the Amazon mainstem as may be observed by SWOT were thereby obtained. Using these measurements, estimates of river slope and discharge were derived and compared to those which may be obtained without error, and those obtained directly from the hydraulic model. It was found that discharge can be reproduced highly accurately from the water height, without knowledge of the detailed channel bathymetry using a modified Manning's equation, if friction, depth, width and slope are known. Increasing reach length was found to be an effective method to reduce systematic height error in SWOT measurements.
Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

PubMed

Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

2012-05-01

Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Replica approach to mean-variance portfolio optimization

NASA Astrophysics Data System (ADS)

Varga-Haszonits, Istvan; Caccioli, Fabio; Kondor, Imre

2016-12-01

We consider the problem of mean-variance portfolio optimization for a generic covariance matrix subject to the budget constraint and the constraint for the expected return, with the application of the replica method borrowed from the statistical physics of disordered systems. We find that the replica symmetry of the solution does not need to be assumed, but emerges as the unique solution of the optimization problem. We also check the stability of this solution and find that the eigenvalues of the Hessian are positive for r = N/T < 1, where N is the dimension of the portfolio and T the length of the time series used to estimate the covariance matrix. At the critical point r = 1 a phase transition is taking place. The out of sample estimation error blows up at this point as 1/(1 - r), independently of the covariance matrix or the expected return, displaying the universality not only of the critical exponent, but also the critical point. As a conspicuous illustration of the dangers of in-sample estimates, the optimal in-sample variance is found to vanish at the critical point inversely proportional to the divergent estimation error.
Calculating the free energy of transfer of small solutes into a model lipid membrane: Comparison between metadynamics and umbrella sampling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bochicchio, Davide; Panizon, Emanuele; Ferrando, Riccardo

2015-10-14

We compare the performance of two well-established computational algorithms for the calculation of free-energy landscapes of biomolecular systems, umbrella sampling and metadynamics. We look at benchmark systems composed of polyethylene and polypropylene oligomers interacting with lipid (phosphatidylcholine) membranes, aiming at the calculation of the oligomer water-membrane free energy of transfer. We model our test systems at two different levels of description, united-atom and coarse-grained. We provide optimized parameters for the two methods at both resolutions. We devote special attention to the analysis of statistical errors in the two different methods and propose a general procedure for the error estimation inmore » metadynamics simulations. Metadynamics and umbrella sampling yield the same estimates for the water-membrane free energy profile, but metadynamics can be more efficient, providing lower statistical uncertainties within the same simulation time.« less
Exploiting the Modified Colombo-Nyquist Rule for Co-estimating Sub-monthly Gravity Field Solutions from a GRACE-like Mission

NASA Astrophysics Data System (ADS)

Devaraju, B.; Weigelt, M.; Mueller, J.

2017-12-01

In order to suppress the impact of aliasing errors on the standard monthly GRACE gravity-field solutions, co-estimating sub-monthly (daily/two-day) low-degree solutions has been suggested as a solution. The maximum degree of the low-degree solutions is chosen via the Colombo-Nyquist rule of thumb. However, it is now established that the sampling of satellites puts a restriction on the maximum estimable order and not the degree - modified Colombo-Nyquist rule. Therefore, in this contribution, we co-estimate low-order sub-monthly solutions, and compare and contrast them with the low-degree sub-monthly solutions. We also investigate their efficacies in dealing with aliasing errors.

Estimating Prediction Uncertainty from Geographical Information System Raster Processing: A User's Manual for the Raster Error Propagation Tool (REPTool)

USGS Publications Warehouse

Gurdak, Jason J.; Qi, Sharon L.; Geisler, Michael L.

2009-01-01

The U.S. Geological Survey Raster Error Propagation Tool (REPTool) is a custom tool for use with the Environmental System Research Institute (ESRI) ArcGIS Desktop application to estimate error propagation and prediction uncertainty in raster processing operations and geospatial modeling. REPTool is designed to introduce concepts of error and uncertainty in geospatial data and modeling and provide users of ArcGIS Desktop a geoprocessing tool and methodology to consider how error affects geospatial model output. Similar to other geoprocessing tools available in ArcGIS Desktop, REPTool can be run from a dialog window, from the ArcMap command line, or from a Python script. REPTool consists of public-domain, Python-based packages that implement Latin Hypercube Sampling within a probabilistic framework to track error propagation in geospatial models and quantitatively estimate the uncertainty of the model output. Users may specify error for each input raster or model coefficient represented in the geospatial model. The error for the input rasters may be specified as either spatially invariant or spatially variable across the spatial domain. Users may specify model output as a distribution of uncertainty for each raster cell. REPTool uses the Relative Variance Contribution method to quantify the relative error contribution from the two primary components in the geospatial model - errors in the model input data and coefficients of the model variables. REPTool is appropriate for many types of geospatial processing operations, modeling applications, and related research questions, including applications that consider spatially invariant or spatially variable error in geospatial data.
Precipitation and Latent Heating Distributions from Satellite Passive Microwave Radiometry. Part 2; Evaluation of Estimates Using Independent Data

NASA Technical Reports Server (NTRS)

Yang, Song; Olson, William S.; Wang, Jian-Jian; Bell, Thomas L.; Smith, Eric A.; Kummerow, Christian D.

2004-01-01

Rainfall rate estimates from space-borne k&ents are generally accepted as reliable by a majority of the atmospheric science commu&y. One-of the Tropical Rainfall Measuring Mission (TRh4M) facility rain rate algorithms is based upon passive microwave observations fiom the TRMM Microwave Imager (TMI). Part I of this study describes improvements in the TMI algorithm that are required to introduce cloud latent heating and drying as additional algorithm products. Here, estimates of surface rain rate, convective proportion, and latent heating are evaluated using independent ground-based estimates and satellite products. Instantaneous, OP5resolution estimates of surface rain rate over ocean fiom the improved TMI algorithm are well correlated with independent radar estimates (r approx. 0.88 over the Tropics), but bias reduction is the most significant improvement over forerunning algorithms. The bias reduction is attributed to the greater breadth of cloud-resolving model simulations that support the improved algorithm, and the more consistent and specific convective/stratiform rain separation method utilized. The bias of monthly, 2.5 deg. -resolution estimates is similarly reduced, with comparable correlations to radar estimates. Although the amount of independent latent heating data are limited, TMI estimated latent heating profiles compare favorably with instantaneous estimates based upon dual-Doppler radar observations, and time series of surface rain rate and heating profiles are generally consistent with those derived from rawinsonde analyses. Still, some biases in profile shape are evident, and these may be resolved with: (a) additional contextual information brought to the estimation problem, and/or; (b) physically-consistent and representative databases supporting the algorithm. A model of the random error in instantaneous, 0.5 deg-resolution rain rate estimates appears to be consistent with the levels of error determined from TMI comparisons to collocated radar. Error model modifications for non-raining situations will be required, however. Sampling error appears to represent only a fraction of the total error in monthly, 2S0-resolution TMI estimates; the remaining error is attributed to physical inconsistency or non-representativeness of cloud-resolving model simulated profiles supporting the algorithm.
Mapping Error in Southern Ocean Transport Computed from Satellite Altimetry and Argo

NASA Astrophysics Data System (ADS)

Kosempa, M.; Chambers, D. P.

2016-02-01

Argo profiling floats afford basin-scale coverage of the Southern Ocean since 2005. When density estimates from Argo are combined with surface geostrophic currents derived from satellite altimetry, one can estimate integrated geostrophic transport above 2000 dbar [e.g., Kosempa and Chambers, JGR, 2014]. However, the interpolation techniques relied upon to generate mapped data from Argo and altimetry will impart a mapping error. We quantify this mapping error by sampling the high-resolution Southern Ocean State Estimate (SOSE) at the locations of Argo floats and Jason-1, and -2 altimeter ground tracks, then create gridded products using the same optimal interpolation algorithms used for the Argo/altimetry gridded products. We combine these surface and subsurface grids to compare the sampled-then-interpolated transport grids to those from the original SOSE data in an effort to quantify the uncertainty in volume transport integrated across the Antarctic Circumpolar Current (ACC). This uncertainty is then used to answer two fundamental questions: 1) What is the minimum linear trend that can be observed in ACC transport given the present length of the instrument record? 2) How long must the instrument record be to observe a trend with an accuracy of 0.1 Sv/year?
Design and simulation study of the immunization Data Quality Audit (DQA).

PubMed

Woodard, Stacy; Archer, Linda; Zell, Elizabeth; Ronveaux, Olivier; Birmingham, Maureen

2007-08-01

The goal of the Data Quality Audit (DQA) is to assess whether the Global Alliance for Vaccines and Immunization-funded countries are adequately reporting the number of diphtheria-tetanus-pertussis immunizations given, on which the "shares" are awarded. Given that this sampling design is a modified two-stage cluster sample (modified because a stratified, rather than a simple, random sample of health facilities is obtained from the selected clusters); the formula for the calculation of the standard error for the estimate is unknown. An approximated standard error has been proposed, and the first goal of this simulation is to assess the accuracy of the standard error. Results from the simulations based on hypothetical populations were found not to be representative of the actual DQAs that were conducted. Additional simulations were then conducted on the actual DQA data to better access the precision of the DQ with both the original and the increased sample sizes.
Nearly two decades using the check-type to prevent ABO incompatible transfusions: one institution's experience.

PubMed

Figueroa, Priscila I; Ziman, Alyssa; Wheeler, Christine; Gornbein, Jeffrey; Monson, Michael; Calhoun, Loni

2006-09-01

To detect miscollected (wrong blood in tube [WBIT]) samples, our institution requires a second independently drawn sample (check-type [CT]) on previously untyped, non-group O patients who are likely to require transfusion. During the 17-year period addressed by this report, 94 WBIT errors were detected: 57% by comparison with a historic blood type, 7% by the CT, and 35% by other means. The CT averted 5 potential ABO-incompatible transfusions. Our corrected WBIT error rate is 1 in 3,713 for verified samples tested between 2000 and 2003, the period for which actual number of CTs performed was available. The estimated rate of WBIT for the 17-year period is 1 in 2,262 samples. ABO-incompatible transfusions due to WBIT-type errors are avoided by comparison of current blood type results with a historic type, and the CT is an effective way to create a historic type.
Estimating Power System Dynamic States Using Extended Kalman Filter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Zhenyu; Schneider, Kevin P.; Nieplocha, Jaroslaw

2014-10-31

Abstract—The state estimation tools which are currently deployed in power system control rooms are based on a steady state assumption. As a result, the suite of operational tools that rely on state estimation results as inputs do not have dynamic information available and their accuracy is compromised. This paper investigates the application of Extended Kalman Filtering techniques for estimating dynamic states in the state estimation process. The new formulated “dynamic state estimation” includes true system dynamics reflected in differential equations, not like previously proposed “dynamic state estimation” which only considers the time-variant snapshots based on steady state modeling. This newmore » dynamic state estimation using Extended Kalman Filter has been successfully tested on a multi-machine system. Sensitivity studies with respect to noise levels, sampling rates, model errors, and parameter errors are presented as well to illustrate the robust performance of the developed dynamic state estimation process.« less
Optimally weighted least-squares steganalysis

NASA Astrophysics Data System (ADS)

Ker, Andrew D.

2007-02-01

Quantitative steganalysis aims to estimate the amount of payload in a stego object, and such estimators seem to arise naturally in steganalysis of Least Significant Bit (LSB) replacement in digital images. However, as with all steganalysis, the estimators are subject to errors, and their magnitude seems heavily dependent on properties of the cover. In very recent work we have given the first derivation of estimation error, for a certain method of steganalysis (the Least-Squares variant of Sample Pairs Analysis) of LSB replacement steganography in digital images. In this paper we make use of our theoretical results to find an improved estimator and detector. We also extend the theoretical analysis to another (more accurate) steganalysis estimator (Triples Analysis) and hence derive an improved version of that estimator too. Experimental results show that the new steganalyzers have improved accuracy, particularly in the difficult case of never-compressed covers.
Evaluating concentration estimation errors in ELISA microarray experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daly, Don S.; White, Amanda M.; Varnum, Susan M.

Enzyme-linked immunosorbent assay (ELISA) is a standard immunoassay to predict a protein concentration in a sample. Deploying ELISA in a microarray format permits simultaneous prediction of the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Evaluating prediction error is critical to interpreting biological significance and improving the ELISA microarray process. Evaluating prediction error must be automated to realize a reliable high-throughput ELISA microarray system. Methods: In this paper, we present a statistical method based on propagation of error to evaluate prediction errors in the ELISA microarray process. Althoughmore » propagation of error is central to this method, it is effective only when comparable data are available. Therefore, we briefly discuss the roles of experimental design, data screening, normalization and statistical diagnostics when evaluating ELISA microarray prediction errors. We use an ELISA microarray investigation of breast cancer biomarkers to illustrate the evaluation of prediction errors. The illustration begins with a description of the design and resulting data, followed by a brief discussion of data screening and normalization. In our illustration, we fit a standard curve to the screened and normalized data, review the modeling diagnostics, and apply propagation of error.« less
Kernel Wiener filter and its application to pattern recognition.

PubMed

Yoshino, Hirokazu; Dong, Chen; Washizawa, Yoshikazu; Yamashita, Yukihiko

2010-11-01

The Wiener filter (WF) is widely used for inverse problems. From an observed signal, it provides the best estimated signal with respect to the squared error averaged over the original and the observed signals among linear operators. The kernel WF (KWF), extended directly from WF, has a problem that an additive noise has to be handled by samples. Since the computational complexity of kernel methods depends on the number of samples, a huge computational cost is necessary for the case. By using the first-order approximation of kernel functions, we realize KWF that can handle such a noise not by samples but as a random variable. We also propose the error estimation method for kernel filters by using the approximations. In order to show the advantages of the proposed methods, we conducted the experiments to denoise images and estimate errors. We also apply KWF to classification since KWF can provide an approximated result of the maximum a posteriori classifier that provides the best recognition accuracy. The noise term in the criterion can be used for the classification in the presence of noise or a new regularization to suppress changes in the input space, whereas the ordinary regularization for the kernel method suppresses changes in the feature space. In order to show the advantages of the proposed methods, we conducted experiments of binary and multiclass classifications and classification in the presence of noise.
Directional selection in temporally replicated studies is remarkably consistent.

PubMed

Morrissey, Michael B; Hadfield, Jarrod D

2012-02-01

Temporal variation in selection is a fundamental determinant of evolutionary outcomes. A recent paper presented a synthetic analysis of temporal variation in selection in natural populations. The authors concluded that there is substantial variation in the strength and direction of selection over time, but acknowledged that sampling error would result in estimates of selection that were more variable than the true values. We reanalyze their dataset using techniques that account for the necessary effect of sampling error to inflate apparent levels of variation and show that directional selection is remarkably constant over time, both in magnitude and direction. Thus we cannot claim that the available data support the existence of substantial temporal heterogeneity in selection. Nonetheless, we conject that temporal variation in selection could be important, but that there are good reasons why it may not appear in the available data. These new analyses highlight the importance of applying techniques that estimate parameters of the distribution of selection, rather than parameters of the distribution of estimated selection (which will reflect both sampling error and "real" variation in selection); indeed, despite availability of methods for the former, focus on the latter has been common in synthetic reviews of the aspects of selection in nature, and can lead to serious misinterpretations. © 2011 The Author(s). Evolution© 2011 The Society for the Study of Evolution.
Evaluating the design of an earth radiation budget instrument with system simulations. Part 2: Minimization of instantaneous sampling errors for CERES-I

NASA Technical Reports Server (NTRS)

Stowe, Larry; Hucek, Richard; Ardanuy, Philip; Joyce, Robert

1994-01-01

Much of the new record of broadband earth radiation budget satellite measurements to be obtained during the late 1990s and early twenty-first century will come from the dual-radiometer Clouds and Earth's Radiant Energy System Instrument (CERES-I) flown aboard sun-synchronous polar orbiters. Simulation studies conducted in this work for an early afternoon satellite orbit indicate that spatial root-mean-square (rms) sampling errors of instantaneous CERES-I shortwave flux estimates will range from about 8.5 to 14.0 W/m on a 2.5 deg latitude and longitude grid resolution. Rms errors in longwave flux estimates are only about 20% as large and range from 1.5 to 3.5 W/sq m. These results are based on an optimal cross-track scanner design that includes 50% footprint overlap to eliminate gaps in the top-of-the-atmosphere coverage, and a 'smallest' footprint size to increase the ratio in the number of observations lying within to the number of observations lying on grid area boundaries. Total instantaneous measurement error also depends on the variability of anisotropic reflectance and emission patterns and on retrieval methods used to generate target area fluxes. Three retrieval procedures from both CERES-I scanners (cross-track and rotating azimuth plane) are used. (1) The baseline Earth Radiaton Budget Experiment (ERBE) procedure, which assumes that errors due to the use of mean angular dependence models (ADMs) in the radiance-to-flux inversion process nearly cancel when averaged over grid areas. (2) To estimate N, instantaneous ADMs are estimated from the multiangular, collocated observations of the two scanners. These observed models replace the mean models in computation of satellite flux estimates. (3) The scene flux approach, conducts separate target-area retrievals for each ERBE scene category and combines their results using area weighting by scene type. The ERBE retrieval performs best when the simulated radiance field departs from the ERBE mean models by less than 10%. For larger perturbations, both the scene flux and collocation methods produce less error than the ERBE retrieval. The scene flux technique is preferable, however, because it involves fewer restrictive assumptions.
Comparative test on several forms of background error covariance in 3DVar

NASA Astrophysics Data System (ADS)

Shao, Aimei

2013-04-01

The background error covariance matrix (Hereinafter referred to as B matrix) plays an important role in the three-dimensional variational (3DVar) data assimilation method. However, it is difficult to get B matrix accurately because true atmospheric state is unknown. Therefore, some methods were developed to estimate B matrix (e.g. NMC method, innovation analysis method, recursive filters, and ensemble method such as EnKF). Prior to further development and application of these methods, the function of several B matrixes estimated by these methods in 3Dvar is worth studying and evaluating. For this reason, NCEP reanalysis data and forecast data are used to test the effectiveness of the several B matrixes with VAF (Huang, 1999) method. Here the NCEP analysis is treated as the truth and in this case the forecast error is known. The data from 2006 to 2007 is used as the samples to estimate B matrix and the data in 2008 is used to verify the assimilation effects. The 48h and 24h forecast valid at the same time is used to estimate B matrix with NMC method. B matrix can be represented by a correlation part (a non-diagonal matrix) and a variance part (a diagonal matrix of variances). Gaussian filter function as an approximate approach is used to represent the variation of correlation coefficients with distance in numerous 3DVar systems. On the basis of the assumption, the following several forms of B matrixes are designed and test with VAF in the comparative experiments: (1) error variance and the characteristic lengths are fixed and setted to their mean value averaged over the analysis domain; (2) similar to (1), but the mean characteristic lengths reduce to 50 percent for the height and 60 percent for the temperature of the original; (3) similar to (2), but error variance calculated directly by the historical data is space-dependent; (4) error variance and characteristic lengths are all calculated directly by the historical data; (5) B matrix is estimated directly by the historical data; (6) similar to (5), but a localization process is performed; (7) B matrix is estimated by NMC method but error variance is reduced by 1.7 times in order that the value is close to that calculated from the true forecast error samples; (8) similar to (7), but the localization similar to (6) is performed. Experimental results with the different B matrixes show that for the Gaussian-type B matrix the characteristic lengths calculated from the true error samples don't bring a good analysis results. However, the reduced characteristic lengths (about half of the original one) can lead to a good analysis. If the B matrix estimated directly from the historical data is used in 3DVar, the assimilation effect can not reach to the best. The better assimilation results are generated with the application of reduced characteristic length and localization. Even so, it hasn't obvious advantage compared with Gaussian-type B matrix with the optimal characteristic length. It implies that the Gaussian-type B matrix, widely used for operational 3DVar system, can get a good analysis with the appropriate characteristic lengths. The crucial problem is how to determine the appropriate characteristic lengths. (This work is supported by the National Natural Science Foundation of China (41275102, 40875063), and the Fundamental Research Funds for the Central Universities (lzujbky-2010-9) )
77 FR 15376 - State Median Income Estimates for a Four-Person Household: Notice of the Federal Fiscal Year (FFY...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-15

... contact the Census Bureau's Social, Economic and Housing Statistics Division at (301) 763- 3243. Under the... the use of probability sampling to create the sample. For additional information about the accuracy of... consists of the error that arises from the use of probability sampling to create the sample. \\2\\ These...
Regionalization of harmonic-mean streamflows in Kentucky

USGS Publications Warehouse

Martin, Gary R.; Ruhl, Kevin J.

1993-01-01

Harmonic-mean streamflow (Qh), defined as the reciprocal of the arithmetic mean of the reciprocal daily streamflow values, was determined for selected stream sites in Kentucky. Daily mean discharges for the available period of record through the 1989 water year at 230 continuous record streamflow-gaging stations located in and adjacent to Kentucky were used in the analysis. Periods of record affected by regulation were identified and analyzed separately from periods of record unaffected by regulation. Record-extension procedures were applied to short-term stations to reducetime-sampling error and, thus, improve estimates of the long-term Qh. Techniques to estimate the Qh at ungaged stream sites in Kentucky were developed. A regression model relating Qh to total drainage area and streamflow-variability index was presented with example applications. The regression model has a standard error of estimate of 76 percent and a standard error of prediction of 78 percent.
Methods for estimating aboveground biomass and its components for Douglas-fir and lodgepole pine trees

Treesearch

K.P. Poudel; H. Temesgen

2016-01-01

Estimating aboveground biomass and its components requires sound statistical formulation and evaluation. Using data collected from 55 destructively sampled trees in different parts of Oregon, we evaluated the performance of three groups of methods to estimate total aboveground biomass and (or) its components based on the bias and root mean squared error (RMSE) that...
A digital clock recovery algorithm based on chromatic dispersion and polarization mode dispersion feedback dual phase detection for coherent optical transmission systems

NASA Astrophysics Data System (ADS)

Liu, Bo; Xin, Xiangjun; Zhang, Lijia; Wang, Fu; Zhang, Qi

2018-02-01

A new feedback symbol timing recovery technique using timing estimation joint equalization is proposed for digital receivers with two samples/symbol or higher sampling rate. Different from traditional methods, the clock recovery algorithm in this paper adopts another algorithm distinguishing the phases of adjacent symbols, so as to accurately estimate the timing offset based on the adjacent signals with the same phase. The addition of the module for eliminating phase modulation interference before timing estimation further reduce the variance, thus resulting in a smoothed timing estimate. The Mean Square Error (MSE) and Bit Error Rate (BER) of the resulting timing estimate are simulated to allow a satisfactory estimation performance. The obtained clock tone performance is satisfactory for MQAM modulation formats and the Roll-off Factor (ROF) close to 0. In the back-to-back system, when ROF= 0, the maximum of MSE obtained with the proposed approach reaches 0 . 0125. After 100-km fiber transmission, BER decreases to 10-3 with ROF= 0 and OSNR = 11 dB. With the increase in ROF, the performances of MSE and BER become better.
Minimax Quantum Tomography: Estimators and Relative Entropy Bounds.

PubMed

Ferrie, Christopher; Blume-Kohout, Robin

2016-03-04

A minimax estimator has the minimum possible error ("risk") in the worst case. We construct the first minimax estimators for quantum state tomography with relative entropy risk. The minimax risk of nonadaptive tomography scales as O(1/sqrt[N])-in contrast to that of classical probability estimation, which is O(1/N)-where N is the number of copies of the quantum state used. We trace this deficiency to sampling mismatch: future observations that determine risk may come from a different sample space than the past data that determine the estimate. This makes minimax estimators very biased, and we propose a computationally tractable alternative with similar behavior in the worst case, but superior accuracy on most states.
What to use to express the variability of data: Standard deviation or standard error of mean?

PubMed

Barde, Mohini P; Barde, Prajakt J

2012-07-01

Statistics plays a vital role in biomedical research. It helps present data precisely and draws the meaningful conclusions. While presenting data, one should be aware of using adequate statistical measures. In biomedical journals, Standard Error of Mean (SEM) and Standard Deviation (SD) are used interchangeably to express the variability; though they measure different parameters. SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD. Use of SEM should be limited to compute CI which measures the precision of population estimate. Journals can avoid such errors by requiring authors to adhere to their guidelines.
Remote Estimation of Vegetation Fraction and Yield in Oilseed Rape with Unmanned Aerial Vehicle Data

NASA Astrophysics Data System (ADS)

Peng, Y.; Fang, S.; Liu, K.; Gong, Y.

2017-12-01

This study developed an approach for remote estimation of Vegetation Fraction (VF) and yield in oilseed rape, which is a crop species with conspicuous flowers during reproduction. Canopy reflectance in green, red, red edge and NIR bands was obtained by a camera system mounted on an unmanned aerial vehicle (UAV) when oilseed rape was in the vegetative growth and flowering stage. The relationship of several widely-used Vegetation Indices (VI) vs. VF was tested and found to be different in different phenology stages. At the same VF when oilseed rape was flowering, canopy reflectance increased in all bands, and the tested VI decreased. Therefore, two algorithms to estimate VF were calibrated respectively, one for samples during vegetative growth and the other for samples during flowering stage. During the flowering season, we also explored the potential of using canopy reflectance or VIs to estimate Flower Fraction (FF) in oilseed rape. Based on FF estimates, rape yield can be estimated using canopy reflectance data. Our model was validated in oilseed rape planted under different nitrogen fertilization applications and in different phenology stages. The results showed that it was able to predict VF and FF accurately in oilseed rape with estimation error below 6% and predict yield with estimation error below 20%.
Sample allocation balancing overall representativeness and stratum precision.

PubMed

Diaz-Quijano, Fredi Alexander

2018-05-07

In large-scale surveys, it is often necessary to distribute a preset sample size among a number of strata. Researchers must make a decision between prioritizing overall representativeness or precision of stratum estimates. Hence, I evaluated different sample allocation strategies based on stratum size. The strategies evaluated herein included allocation proportional to stratum population; equal sample for all strata; and proportional to the natural logarithm, cubic root, and square root of the stratum population. This study considered the fact that, from a preset sample size, the dispersion index of stratum sampling fractions is correlated with the population estimator error and the dispersion index of stratum-specific sampling errors would measure the inequality in precision distribution. Identification of a balanced and efficient strategy was based on comparing those both dispersion indices. Balance and efficiency of the strategies changed depending on overall sample size. As the sample to be distributed increased, the most efficient allocation strategies were equal sample for each stratum; proportional to the logarithm, to the cubic root, to square root; and that proportional to the stratum population, respectively. Depending on sample size, each of the strategies evaluated could be considered in optimizing the sample to keep both overall representativeness and stratum-specific precision. Copyright © 2018 Elsevier Inc. All rights reserved.

Design considerations for case series models with exposure onset measurement error.

PubMed

Mohammed, Sandra M; Dalrymple, Lorien S; Sentürk, Damla; Nguyen, Danh V

2013-02-28

The case series model allows for estimation of the relative incidence of events, such as cardiovascular events, within a pre-specified time window after an exposure, such as an infection. The method requires only cases (individuals with events) and controls for all fixed/time-invariant confounders. The measurement error case series model extends the original case series model to handle imperfect data, where the timing of an infection (exposure) is not known precisely. In this work, we propose a method for power/sample size determination for the measurement error case series model. Extensive simulation studies are used to assess the accuracy of the proposed sample size formulas. We also examine the magnitude of the relative loss of power due to exposure onset measurement error, compared with the ideal situation where the time of exposure is measured precisely. To facilitate the design of case series studies, we provide publicly available web-based tools for determining power/sample size for both the measurement error case series model as well as the standard case series model. Copyright © 2012 John Wiley & Sons, Ltd.
Sampling errors in the estimation of empirical orthogonal functions. [for climatology studies

NASA Technical Reports Server (NTRS)

North, G. R.; Bell, T. L.; Cahalan, R. F.; Moeng, F. J.

1982-01-01

Empirical Orthogonal Functions (EOF's), eigenvectors of the spatial cross-covariance matrix of a meteorological field, are reviewed with special attention given to the necessary weighting factors for gridded data and the sampling errors incurred when too small a sample is available. The geographical shape of an EOF shows large intersample variability when its associated eigenvalue is 'close' to a neighboring one. A rule of thumb indicating when an EOF is likely to be subject to large sampling fluctuations is presented. An explicit example, based on the statistics of the 500 mb geopotential height field, displays large intersample variability in the EOF's for sample sizes of a few hundred independent realizations, a size seldom exceeded by meteorological data sets.
Multistage classification of multispectral Earth observational data: The design approach

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.

1981-01-01

An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
Robust Adaptive Beamforming with Sensor Position Errors Using Weighted Subspace Fitting-Based Covariance Matrix Reconstruction.

PubMed

Chen, Peng; Yang, Yixin; Wang, Yong; Ma, Yuanliang

2018-05-08

When sensor position errors exist, the performance of recently proposed interference-plus-noise covariance matrix (INCM)-based adaptive beamformers may be severely degraded. In this paper, we propose a weighted subspace fitting-based INCM reconstruction algorithm to overcome sensor displacement for linear arrays. By estimating the rough signal directions, we construct a novel possible mismatched steering vector (SV) set. We analyze the proximity of the signal subspace from the sample covariance matrix (SCM) and the space spanned by the possible mismatched SV set. After solving an iterative optimization problem, we reconstruct the INCM using the estimated sensor position errors. Then we estimate the SV of the desired signal by solving an optimization problem with the reconstructed INCM. The main advantage of the proposed algorithm is its robustness against SV mismatches dominated by unknown sensor position errors. Numerical examples show that even if the position errors are up to half of the assumed sensor spacing, the output signal-to-interference-plus-noise ratio is only reduced by 4 dB. Beam patterns plotted using experiment data show that the interference suppression capability of the proposed beamformer outperforms other tested beamformers.
Adjustment of regional regression models of urban-runoff quality using data for Chattanooga, Knoxville, and Nashville, Tennessee

USGS Publications Warehouse

Hoos, Anne B.; Patel, Anant R.

1996-01-01

Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.
Comparison of Precision of Biomass Estimates in Regional Field Sample Surveys and Airborne LiDAR-Assisted Surveys in Hedmark County, Norway

NASA Technical Reports Server (NTRS)

Naesset, Erik; Gobakken, Terje; Bollandsas, Ole Martin; Gregoire, Timothy G.; Nelson, Ross; Stahl, Goeran

2013-01-01

Airborne scanning LiDAR (Light Detection and Ranging) has emerged as a promising tool to provide auxiliary data for sample surveys aiming at estimation of above-ground tree biomass (AGB), with potential applications in REDD forest monitoring. For larger geographical regions such as counties, states or nations, it is not feasible to collect airborne LiDAR data continuously ("wall-to-wall") over the entire area of interest. Two-stage cluster survey designs have therefore been demonstrated by which LiDAR data are collected along selected individual flight-lines treated as clusters and with ground plots sampled along these LiDAR swaths. Recently, analytical AGB estimators and associated variance estimators that quantify the sampling variability have been proposed. Empirical studies employing these estimators have shown a seemingly equal or even larger uncertainty of the AGB estimates obtained with extensive use of LiDAR data to support the estimation as compared to pure field-based estimates employing estimators appropriate under simple random sampling (SRS). However, comparison of uncertainty estimates under SRS and sophisticated two-stage designs is complicated by large differences in the designs and assumptions. In this study, probability-based principles to estimation and inference were followed. We assumed designs of a field sample and a LiDAR-assisted survey of Hedmark County (HC) (27,390 km2), Norway, considered to be more comparable than those assumed in previous studies. The field sample consisted of 659 systematically distributed National Forest Inventory (NFI) plots and the airborne scanning LiDAR data were collected along 53 parallel flight-lines flown over the NFI plots. We compared AGB estimates based on the field survey only assuming SRS against corresponding estimates assuming two-phase (double) sampling with LiDAR and employing model-assisted estimators. We also compared AGB estimates based on the field survey only assuming two-stage sampling (the NFI plots being grouped in clusters) against corresponding estimates assuming two-stage sampling with the LiDAR and employing model-assisted estimators. For each of the two comparisons, the standard errors of the AGB estimates were consistently lower for the LiDAR-assisted designs. The overall reduction of the standard errors in the LiDAR-assisted estimation was around 40-60% compared to the pure field survey. We conclude that the previously proposed two-stage model-assisted estimators are inappropriate for surveys with unequal lengths of the LiDAR flight-lines and new estimators are needed. Some options for design of LiDAR-assisted sample surveys under REDD are also discussed, which capitalize on the flexibility offered when the field survey is designed as an integrated part of the overall survey design as opposed to previous LiDAR-assisted sample surveys in the boreal and temperate zones which have been restricted by the current design of an existing NFI.
Estimation of the discharges of the multiple water level stations by multi-objective optimization

NASA Astrophysics Data System (ADS)

Matsumoto, Kazuhiro; Miyamoto, Mamoru; Yamakage, Yuzuru; Tsuda, Morimasa; Yanami, Hitoshi; Anai, Hirokazu; Iwami, Yoichi

2016-04-01

This presentation shows two aspects of the parameter identification to estimate the discharges of the multiple water level stations by multi-objective optimization. One is how to adjust the parameters to estimate the discharges accurately. The other is which optimization algorithms are suitable for the parameter identification. Regarding the previous studies, there is a study that minimizes the weighted error of the discharges of the multiple water level stations by single-objective optimization. On the other hand, there are some studies that minimize the multiple error assessment functions of the discharge of a single water level station by multi-objective optimization. This presentation features to simultaneously minimize the errors of the discharges of the multiple water level stations by multi-objective optimization. Abe River basin in Japan is targeted. The basin area is 567.0km2. There are thirteen rainfall stations and three water level stations. Nine flood events are investigated. They occurred from 2005 to 2012 and the maximum discharges exceed 1,000m3/s. The discharges are calculated with PWRI distributed hydrological model. The basin is partitioned into the meshes of 500m x 500m. Two-layer tanks are placed on each mesh. Fourteen parameters are adjusted to estimate the discharges accurately. Twelve of them are the hydrological parameters and two of them are the parameters of the initial water levels of the tanks. Three objective functions are the mean squared errors between the observed and calculated discharges at the water level stations. Latin Hypercube sampling is one of the uniformly sampling algorithms. The discharges are calculated with respect to the parameter values sampled by a simplified version of Latin Hypercube sampling. The observed discharge is surrounded by the calculated discharges. It suggests that it might be possible to estimate the discharge accurately by adjusting the parameters. In a sense, it is true that the discharge of a water level station can be accurately estimated by setting the parameter values optimized to the responding water level station. However, there are some cases that the calculated discharge by setting the parameter values optimized to one water level station does not meet the observed discharge at another water level station. It is important to estimate the discharges of all the water level stations in some degree of accuracy. It turns out to be possible to select the parameter values from the pareto optimal solutions by the condition that all the normalized errors by the minimum error of the responding water level station are under 3. The optimization performance of five implementations of the algorithms and a simplified version of Latin Hypercube sampling are compared. Five implementations are NSGA2 and PAES of an optimization software inspyred and MCO_NSGA2R, MOPSOCD and NSGA2R_NSGA2R of a statistical software R. NSGA2, PAES and MOPSOCD are the optimization algorithms of a genetic algorithm, an evolution strategy and a particle swarm optimization respectively. The number of the evaluations of the objective functions is 10,000. Two implementations of NSGA2 of R outperform the others. They are promising to be suitable for the parameter identification of PWRI distributed hydrological model.
The observed clustering of damaging extratropical cyclones in Europe

NASA Astrophysics Data System (ADS)

Cusack, Stephen

2016-04-01

The clustering of severe European windstorms on annual timescales has substantial impacts on the (re-)insurance industry. Our knowledge of the risk is limited by large uncertainties in estimates of clustering from typical historical storm data sets covering the past few decades. Eight storm data sets are gathered for analysis in this study in order to reduce these uncertainties. Six of the data sets contain more than 100 years of severe storm information to reduce sampling errors, and observational errors are reduced by the diversity of information sources and analysis methods between storm data sets. All storm severity measures used in this study reflect damage, to suit (re-)insurance applications. The shortest storm data set of 42 years provides indications of stronger clustering with severity, particularly for regions off the main storm track in central Europe and France. However, clustering estimates have very large sampling and observational errors, exemplified by large changes in estimates in central Europe upon removal of one stormy season, 1989/1990. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm data sets show increased clustering between more severe storms from return periods (RPs) of 0.5 years to the longest measured RPs of about 20 years. Further, they contain signs of stronger clustering off the main storm track, and weaker clustering for smaller-sized areas, though these signals are more uncertain as they are drawn from smaller data samples. These new ultra-long storm data sets provide new information on clustering to improve our management of this risk.
An Investigation of the Sample Performance of Two Nonnormality Corrections for RMSEA

ERIC Educational Resources Information Center

Brosseau-Liard, Patricia E.; Savalei, Victoria; Li, Libo

2012-01-01

The root mean square error of approximation (RMSEA) is a popular fit index in structural equation modeling (SEM). Typically, RMSEA is computed using the normal theory maximum likelihood (ML) fit function. Under nonnormality, the uncorrected sample estimate of the ML RMSEA tends to be inflated. Two robust corrections to the sample ML RMSEA have…
Change-in-ratio estimators for populations with more than two subclasses

USGS Publications Warehouse

Udevitz, Mark S.; Pollock, Kenneth H.

1991-01-01

Change-in-ratio methods have been developed to estimate the size of populations with two or three population subclasses. Most of these methods require the often unreasonable assumption of equal sampling probabilities for individuals in all subclasses. This paper presents new models based on the weaker assumption that ratios of sampling probabilities are constant over time for populations with three or more subclasses. Estimation under these models requires that a value be assumed for one of these ratios when there are two samples. Explicit expressions are given for the maximum likelihood estimators under models for two samples with three or more subclasses and for three samples with two subclasses. A numerical method using readily available statistical software is described for obtaining the estimators and their standard errors under all of the models. Likelihood ratio tests that can be used in model selection are discussed. Emphasis is on the two-sample, three-subclass models for which Monte-Carlo simulation results and an illustrative example are presented.
Uncertainty in sample estimates and the implicit loss function for soil information.

NASA Astrophysics Data System (ADS)

Lark, Murray

2015-04-01

One significant challenge in the communication of uncertain information is how to enable the sponsors of sampling exercises to make a rational choice of sample size. One way to do this is to compute the value of additional information given the loss function for errors. The loss function expresses the costs that result from decisions made using erroneous information. In certain circumstances, such as remediation of contaminated land prior to development, loss functions can be computed and used to guide rational decision making on the amount of resource to spend on sampling to collect soil information. In many circumstances the loss function cannot be obtained prior to decision making. This may be the case when multiple decisions may be based on the soil information and the costs of errors are hard to predict. The implicit loss function is proposed as a tool to aid decision making in these circumstances. Conditional on a logistical model which expresses costs of soil sampling as a function of effort, and statistical information from which the error of estimates can be modelled as a function of effort, the implicit loss function is the loss function which makes a particular decision on effort rational. In this presentation the loss function is defined and computed for a number of arbitrary decisions on sampling effort for a hypothetical soil monitoring problem. This is based on a logistical model of sampling cost parameterized from a recent geochemical survey of soil in Donegal, Ireland and on statistical parameters estimated with the aid of a process model for change in soil organic carbon. It is shown how the implicit loss function might provide a basis for reflection on a particular choice of sample size by comparing it with the values attributed to soil properties and functions. Scope for further research to develop and apply the implicit loss function to help decision making by policy makers and regulators is then discussed.
Standard error of estimated average timber volume per acre under point sampling when trees are measured for volume on a subsample of all points.

Treesearch

Floyd A. Johnson

1961-01-01

This report assumes a knowledge of the principles of point sampling as described by Grosenbaugh, Bell and Alexander, and others. Whenever trees are counted at every point in a sample of points (large sample) and measured for volume at a portion (small sample) of these points, the sampling design could be called ratio double sampling. If the large...
Analyzing thematic maps and mapping for accuracy

USGS Publications Warehouse

Rosenfield, G.H.

1982-01-01

Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.
Is Coefficient Alpha Robust to Non-Normal Data?

PubMed Central

Sheng, Yanyan; Sheng, Zhaohui

2011-01-01

Coefficient alpha has been a widely used measure by which internal consistency reliability is assessed. In addition to essential tau-equivalence and uncorrelated errors, normality has been noted as another important assumption for alpha. Earlier work on evaluating this assumption considered either exclusively non-normal error score distributions, or limited conditions. In view of this and the availability of advanced methods for generating univariate non-normal data, Monte Carlo simulations were conducted to show that non-normal distributions for true or error scores do create problems for using alpha to estimate the internal consistency reliability. The sample coefficient alpha is affected by leptokurtic true score distributions, or skewed and/or kurtotic error score distributions. Increased sample sizes, not test lengths, help improve the accuracy, bias, or precision of using it with non-normal data. PMID:22363306
Brief communication: the relation between standard error of the estimate and sample size of histomorphometric aging methods.

PubMed

Hennig, Cheryl; Cooper, David

2011-08-01

Histomorphometric aging methods report varying degrees of precision, measured through Standard Error of the Estimate (SEE). These techniques have been developed from variable samples sizes (n) and the impact of n on reported aging precision has not been rigorously examined in the anthropological literature. This brief communication explores the relation between n and SEE through a review of the literature (abstracts, articles, book chapters, theses, and dissertations), predictions based upon sampling theory and a simulation. Published SEE values for age prediction, derived from 40 studies, range from 1.51 to 16.48 years (mean 8.63; sd: 3.81 years). In general, these values are widely distributed for smaller samples and the distribution narrows as n increases--a pattern expected from sampling theory. For the two studies that have samples in excess of 200 individuals, the SEE values are very similar (10.08 and 11.10 years) with a mean of 10.59 years. Assuming this mean value is a 'true' characterization of the error at the population level, the 95% confidence intervals for SEE values from samples of 10, 50, and 150 individuals are on the order of ± 4.2, 1.7, and 1.0 years, respectively. While numerous sources of variation potentially affect the precision of different methods, the impact of sample size cannot be overlooked. The uncertainty associated with SEE values derived from smaller samples complicates the comparison of approaches based upon different methodology and/or skeletal elements. Meaningful comparisons require larger samples than have frequently been used and should ideally be based upon standardized samples. Copyright © 2011 Wiley-Liss, Inc.
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

PubMed Central

Bondell, Howard D.; Stefanski, Leonard A.

2013-01-01

Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805
A new formula for assessing skeletal age in growing infants and children by measuring carpals and epiphyses of radio and ulna.

PubMed

De Luca, Stefano; Mangiulli, Tatiana; Merelli, Vera; Conforti, Federica; Velandia Palacio, Luz Andrea; Agostini, Susanna; Spinas, Enrico; Cameriere, Roberto

2016-04-01

The aim of this study is to develop a specific formula for the purpose of assessing skeletal age in a sample of Italian growing infants and children by measuring carpals and epiphyses of radio and ulna. A sample of 332 X-rays of left hand-wrist bones (130 boys and 202 girls), aged between 1 and 16 years, was analyzed retrospectively. Analysis of covariance (ANCOVA) was applied to study how sex affects the growth of the ratio Bo/Ca in the boys and girls groups. The regression model, describing age as a linear function of sex and the Bo/Ca ratio for the new Italian sample, yielded the following formula: Age = -1.7702 + 1.0088 g + 14.8166 (Bo/Ca). This model explained 83.5% of total variance (R(2) = 0.835). The median of the absolute values of residuals (observed age minus predicted age) was -0.38, with a quartile deviation of 2.01 and a standard error of estimate of 1.54. A second sample test of 204 Italian children (108 girls and 96 boys), aged between 1 and 16 years, was used to evaluate the accuracy of the specific regression model. A sample paired t-test was used to analyze the mean differences between the skeletal and chronological age. The mean error for girls is 0.00 and the estimated age is slightly underestimated in boys with a mean error of -0.30 years. The standard deviations are 0.70 years for girls and 0.78 years for boys. The obtained results indicate that there is a high relationship between estimated and chronological ages. Copyright © 2016 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Assumption-free estimation of the genetic contribution to refractive error across childhood.

PubMed

Guggenheim, Jeremy A; St Pourcain, Beate; McMahon, George; Timpson, Nicholas J; Evans, David M; Williams, Cathy

2015-01-01

Studies in relatives have generally yielded high heritability estimates for refractive error: twins 75-90%, families 15-70%. However, because related individuals often share a common environment, these estimates are inflated (via misallocation of unique/common environment variance). We calculated a lower-bound heritability estimate for refractive error free from such bias. Between the ages 7 and 15 years, participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) underwent non-cycloplegic autorefraction at regular research clinics. At each age, an estimate of the variance in refractive error explained by single nucleotide polymorphism (SNP) genetic variants was calculated using genome-wide complex trait analysis (GCTA) using high-density genome-wide SNP genotype information (minimum N at each age=3,404). The variance in refractive error explained by the SNPs ("SNP heritability") was stable over childhood: Across age 7-15 years, SNP heritability averaged 0.28 (SE=0.08, p<0.001). The genetic correlation for refractive error between visits varied from 0.77 to 1.00 (all p<0.001) demonstrating that a common set of SNPs was responsible for the genetic contribution to refractive error across this period of childhood. Simulations suggested lack of cycloplegia during autorefraction led to a small underestimation of SNP heritability (adjusted SNP heritability=0.35; SE=0.09). To put these results in context, the variance in refractive error explained (or predicted) by the time participants spent outdoors was <0.005 and by the time spent reading was <0.01, based on a parental questionnaire completed when the child was aged 8-9 years old. Genetic variation captured by common SNPs explained approximately 35% of the variation in refractive error between unrelated subjects. This value sets an upper limit for predicting refractive error using existing SNP genotyping arrays, although higher-density genotyping in larger samples and inclusion of interaction effects is expected to raise this figure toward twin- and family-based heritability estimates. The same SNPs influenced refractive error across much of childhood. Notwithstanding the strong evidence of association between time outdoors and myopia, and time reading and myopia, less than 1% of the variance in myopia at age 15 was explained by crude measures of these two risk factors, indicating that their effects may be limited, at least when averaged over the whole population.
Efficient design and inference for multistage randomized trials of individualized treatment policies.

PubMed

Dawson, Ree; Lavori, Philip W

2012-01-01

Clinical demand for individualized "adaptive" treatment policies in diverse fields has spawned development of clinical trial methodology for their experimental evaluation via multistage designs, building upon methods intended for the analysis of naturalistically observed strategies. Because often there is no need to parametrically smooth multistage trial data (in contrast to observational data for adaptive strategies), it is possible to establish direct connections among different methodological approaches. We show by algebraic proof that the maximum likelihood (ML) and optimal semiparametric (SP) estimators of the population mean of the outcome of a treatment policy and its standard error are equal under certain experimental conditions. This result is used to develop a unified and efficient approach to design and inference for multistage trials of policies that adapt treatment according to discrete responses. We derive a sample size formula expressed in terms of a parametric version of the optimal SP population variance. Nonparametric (sample-based) ML estimation performed well in simulation studies, in terms of achieved power, for scenarios most likely to occur in real studies, even though sample sizes were based on the parametric formula. ML outperformed the SP estimator; differences in achieved power predominately reflected differences in their estimates of the population mean (rather than estimated standard errors). Neither methodology could mitigate the potential for overestimated sample sizes when strong nonlinearity was purposely simulated for certain discrete outcomes; however, such departures from linearity may not be an issue for many clinical contexts that make evaluation of competitive treatment policies meaningful.
An integrated study of earth resources in the state of California using remote sensing techniques

NASA Technical Reports Server (NTRS)

Colwell, R. N. (Principal Investigator)

1975-01-01

The author has identified the following significant results. A weighted stratified double sample design using hardcopy LANDSAT-1 and ground data was utilized in developmental studies for snow water content estimation. Study results gave a correlation coefficient of 0.80 between LANDSAT sample units estimates of snow water content and ground subsamples. A basin snow water content estimate allowable error was given as 1.00 percent at the 99 percent confidence level with the same budget level utilized in conventional snow surveys. Several evapotranspiration estimation models were selected for efficient application at each level of data to be sampled. An area estimation procedure for impervious surface types of differing impermeability adjacent to stream channels was developed. This technique employs a double sample of 1:125,000 color infrared hightflight transparency data with ground or large scale photography.

Profile-likelihood Confidence Intervals in Item Response Theory Models.

PubMed

Chalmers, R Philip; Pek, Jolynn; Liu, Yang

2017-01-01

Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
An affordable cuff-less blood pressure estimation solution.

PubMed

Jain, Monika; Kumar, Niranjan; Deb, Sujay

2016-08-01

This paper presents a cuff-less hypertension pre-screening device that non-invasively monitors the Blood Pressure (BP) and Heart Rate (HR) continuously. The proposed device simultaneously records two clinically significant and highly correlated biomedical signals, viz., Electrocardiogram (ECG) and Photoplethysmogram (PPG). The device provides a common data acquisition platform that can interface with PC/laptop, Smart phone/tablet and Raspberry-pi etc. The hardware stores and processes the recorded ECG and PPG in order to extract the real-time BP and HR using kernel regression approach. The BP and HR estimation error is measured in terms of normalized mean square error, Error Standard Deviation (ESD) and Mean Absolute Error (MAE), with respect to a clinically proven digital BP monitor (OMRON HBP1300). The computed error falls under the maximum standard allowable error mentioned by Association for the Advancement of Medical Instrumentation; MAE <; 5 mmHg and ESD <; 8mmHg. The results are validated using two-tailed dependent sample t-test also. The proposed device is a portable low-cost home and clinic bases solution for continuous health monitoring.
Analysis of counting errors in the phase/Doppler particle analyzer

NASA Technical Reports Server (NTRS)

Oldenburg, John R.

1987-01-01

NASA is investigating the application of the Phase Doppler measurement technique to provide improved drop sizing and liquid water content measurements in icing research. The magnitude of counting errors were analyzed because these errors contribute to inaccurate liquid water content measurements. The Phase Doppler Particle Analyzer counting errors due to data transfer losses and coincidence losses were analyzed for data input rates from 10 samples/sec to 70,000 samples/sec. Coincidence losses were calculated by determining the Poisson probability of having more than one event occurring during the droplet signal time. The magnitude of the coincidence loss can be determined, and for less than a 15 percent loss, corrections can be made. The data transfer losses were estimated for representative data transfer rates. With direct memory access enabled, data transfer losses are less than 5 percent for input rates below 2000 samples/sec. With direct memory access disabled losses exceeded 20 percent at a rate of 50 samples/sec preventing accurate number density or mass flux measurements. The data transfer losses of a new signal processor were analyzed and found to be less than 1 percent for rates under 65,000 samples/sec.
Errors in the estimation method for the rejection of vibrations in adaptive optics systems

NASA Astrophysics Data System (ADS)

Kania, Dariusz

2017-06-01

In recent years the problem of the mechanical vibrations impact in adaptive optics (AO) systems has been renewed. These signals are damped sinusoidal signals and have deleterious effect on the system. One of software solutions to reject the vibrations is an adaptive method called AVC (Adaptive Vibration Cancellation) where the procedure has three steps: estimation of perturbation parameters, estimation of the frequency response of the plant, update the reference signal to reject/minimalize the vibration. In the first step a very important problem is the estimation method. A very accurate and fast (below 10 ms) estimation method of these three parameters has been presented in several publications in recent years. The method is based on using the spectrum interpolation and MSD time windows and it can be used to estimate multifrequency signals. In this paper the estimation method is used in the AVC method to increase the system performance. There are several parameters that affect the accuracy of obtained results, e.g. CiR - number of signal periods in a measurement window, N - number of samples in the FFT procedure, H - time window order, SNR, b - number of ADC bits, γ - damping ratio of the tested signal. Systematic errors increase when N, CiR, H decrease and when γ increases. The value for systematic error is approximately 10^-10 Hz/Hz for N = 2048 and CiR = 0.1. This paper presents equations that can used to estimate maximum systematic errors for given values of H, CiR and N before the start of the estimation process.
Limited sampling strategy models for estimating the AUC of gliclazide in Chinese healthy volunteers.

PubMed

Huang, Ji-Han; Wang, Kun; Huang, Xiao-Hui; He, Ying-Chun; Li, Lu-Jin; Sheng, Yu-Cheng; Yang, Juan; Zheng, Qing-Shan

2013-06-01

The aim of this work is to reduce the cost of required sampling for the estimation of the area under the gliclazide plasma concentration versus time curve within 60 h (AUC0-60t ). The limited sampling strategy (LSS) models were established and validated by the multiple regression model within 4 or fewer gliclazide concentration values. Absolute prediction error (APE), root of mean square error (RMSE) and visual prediction check were used as criterion. The results of Jack-Knife validation showed that 10 (25.0 %) of the 40 LSS based on the regression analysis were not within an APE of 15 % using one concentration-time point. 90.2, 91.5 and 92.4 % of the 40 LSS models were capable of prediction using 2, 3 and 4 points, respectively. Limited sampling strategies were developed and validated for estimating AUC0-60t of gliclazide. This study indicates that the implementation of an 80 mg dosage regimen enabled accurate predictions of AUC0-60t by the LSS model. This study shows that 12, 6, 4, 2 h after administration are the key sampling times. The combination of (12, 2 h), (12, 8, 2 h) or (12, 8, 4, 2 h) can be chosen as sampling hours for predicting AUC0-60t in practical application according to requirement.
Natural sampling strategy

NASA Technical Reports Server (NTRS)

Hallum, C. R.; Basu, J. P. (Principal Investigator)

1979-01-01

A natural stratum-based sampling scheme and the aggregation procedures for estimating wheat area, yield, and production and their associated prediction error estimates are described. The methodology utilizes LANDSAT imagery and agrophysical data to permit an improved stratification in foreign areas by ignoring political boundaries and restratifying along boundaries that are more homogeneous with respect to the distribution of agricultural density, soil characteristics, and average climatic conditions. A summary of test results is given including a discussion of the various problems encountered.
Adaptive Green-Kubo estimates of transport coefficients from molecular dynamics based on robust error analysis.

PubMed

Jones, Reese E; Mandadapu, Kranthi K

2012-04-21

We present a rigorous Green-Kubo methodology for calculating transport coefficients based on on-the-fly estimates of: (a) statistical stationarity of the relevant process, and (b) error in the resulting coefficient. The methodology uses time samples efficiently across an ensemble of parallel replicas to yield accurate estimates, which is particularly useful for estimating the thermal conductivity of semi-conductors near their Debye temperatures where the characteristic decay times of the heat flux correlation functions are large. Employing and extending the error analysis of Zwanzig and Ailawadi [Phys. Rev. 182, 280 (1969)] and Frenkel [in Proceedings of the International School of Physics "Enrico Fermi", Course LXXV (North-Holland Publishing Company, Amsterdam, 1980)] to the integral of correlation, we are able to provide tight theoretical bounds for the error in the estimate of the transport coefficient. To demonstrate the performance of the method, four test cases of increasing computational cost and complexity are presented: the viscosity of Ar and water, and the thermal conductivity of Si and GaN. In addition to producing accurate estimates of the transport coefficients for these materials, this work demonstrates precise agreement of the computed variances in the estimates of the correlation and the transport coefficient with the extended theory based on the assumption that fluctuations follow a Gaussian process. The proposed algorithm in conjunction with the extended theory enables the calculation of transport coefficients with the Green-Kubo method accurately and efficiently.
Adaptive Green-Kubo estimates of transport coefficients from molecular dynamics based on robust error analysis

NASA Astrophysics Data System (ADS)

Jones, Reese E.; Mandadapu, Kranthi K.

2012-04-01

We present a rigorous Green-Kubo methodology for calculating transport coefficients based on on-the-fly estimates of: (a) statistical stationarity of the relevant process, and (b) error in the resulting coefficient. The methodology uses time samples efficiently across an ensemble of parallel replicas to yield accurate estimates, which is particularly useful for estimating the thermal conductivity of semi-conductors near their Debye temperatures where the characteristic decay times of the heat flux correlation functions are large. Employing and extending the error analysis of Zwanzig and Ailawadi [Phys. Rev. 182, 280 (1969)], 10.1103/PhysRev.182.280 and Frenkel [in Proceedings of the International School of Physics "Enrico Fermi", Course LXXV (North-Holland Publishing Company, Amsterdam, 1980)] to the integral of correlation, we are able to provide tight theoretical bounds for the error in the estimate of the transport coefficient. To demonstrate the performance of the method, four test cases of increasing computational cost and complexity are presented: the viscosity of Ar and water, and the thermal conductivity of Si and GaN. In addition to producing accurate estimates of the transport coefficients for these materials, this work demonstrates precise agreement of the computed variances in the estimates of the correlation and the transport coefficient with the extended theory based on the assumption that fluctuations follow a Gaussian process. The proposed algorithm in conjunction with the extended theory enables the calculation of transport coefficients with the Green-Kubo method accurately and efficiently.
Statistical inference involving binomial and negative binomial parameters.

PubMed

García-Pérez, Miguel A; Núñez-Antón, Vicente

2009-05-01

Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Efficiency and precision for estimating timber and non-timber attributes using Landsat-based stratification methods in two-phase sampling in northwest California

Treesearch

Antti T. Kaartinen; Jeremy S. Fried; Paul A. Dunham

2002-01-01

Three Landsat TM-based GIS layers were evaluated as alternatives to conventional, photointerpretation-based stratification of FIA field plots. Estimates for timberland area, timber volume, and volume of down wood were calculated for California's North Coast Survey Unit of 2.5 million hectares. The estimates were compared on the basis of standard errors,...
Validation of Student and Parent Report Data on the Basic Grant Application Form. Final Report. Volume II, Estimated Income: CRT Look-Up Study, Individual Case Follow-Up Study, IRS Sample Study.

ERIC Educational Resources Information Center

Novalis, Carol

The use of estimated income to analyze financial need of applicants to the Basic Educational Opportunity Grant (BEOG) program was investigated. Attention was focused on: how well applicants estimate their income; reasons for errors in estimation, and whether applicants supplying income tax returns supply true versions. For 1,547 eligible BEOG…
Combining satellite lidar, airborne lidar, and ground plots to estimate the amount and distribution of aboveground biomass in the boreal forest of North America 1

Treesearch

Hank A. Margolis; Ross F. Nelson; Paul M. Montesano; André Beaudoin; Guoqing Sun; Hans-Erik Andersen; Michael A. Wulder

2015-01-01

We report estimates of the amount, distribution, and uncertainty of aboveground biomass (AGB) of the different ecoregions and forest land cover classes within the North American boreal forest, analyze the factors driving the error estimates, and compare our estimates with other reported values. A three-phase sampling strategy was used (i) to tie ground plot AGB to...
Controlling the type I error rate in two-stage sequential adaptive designs when testing for average bioequivalence.

PubMed

Maurer, Willi; Jones, Byron; Chen, Ying

2018-05-10

In a 2×2 crossover trial for establishing average bioequivalence (ABE) of a generic agent and a currently marketed drug, the recommended approach to hypothesis testing is the two one-sided test (TOST) procedure, which depends, among other things, on the estimated within-subject variability. The power of this procedure, and therefore the sample size required to achieve a minimum power, depends on having a good estimate of this variability. When there is uncertainty, it is advisable to plan the design in two stages, with an interim sample size reestimation after the first stage, using an interim estimate of the within-subject variability. One method and 3 variations of doing this were proposed by Potvin et al. Using simulation, the operating characteristics, including the empirical type I error rate, of the 4 variations (called Methods A, B, C, and D) were assessed by Potvin et al and Methods B and C were recommended. However, none of these 4 variations formally controls the type I error rate of falsely claiming ABE, even though the amount of inflation produced by Method C was considered acceptable. A major disadvantage of assessing type I error rate inflation using simulation is that unless all possible scenarios for the intended design and analysis are investigated, it is impossible to be sure that the type I error rate is controlled. Here, we propose an alternative, principled method of sample size reestimation that is guaranteed to control the type I error rate at any given significance level. This method uses a new version of the inverse-normal combination of p-values test, in conjunction with standard group sequential techniques, that is more robust to large deviations in initial assumptions regarding the variability of the pharmacokinetic endpoints. The sample size reestimation step is based on significance levels and power requirements that are conditional on the first-stage results. This necessitates a discussion and exploitation of the peculiar properties of the power curve of the TOST testing procedure. We illustrate our approach with an example based on a real ABE study and compare the operating characteristics of our proposed method with those of Method B of Povin et al. Copyright © 2018 John Wiley & Sons, Ltd.
Numerical Demons in Monte Carlo Estimation of Bayesian Model Evidence with Application to Soil Respiration Models

NASA Astrophysics Data System (ADS)

Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.

2016-12-01

Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.
Trans-dimensional inversion of microtremor array dispersion data with hierarchical autoregressive error models

NASA Astrophysics Data System (ADS)

Dettmer, Jan; Molnar, Sheri; Steininger, Gavin; Dosso, Stan E.; Cassidy, John F.

2012-02-01

This paper applies a general trans-dimensional Bayesian inference methodology and hierarchical autoregressive data-error models to the inversion of microtremor array dispersion data for shear wave velocity (vs) structure. This approach accounts for the limited knowledge of the optimal earth model parametrization (e.g. the number of layers in the vs profile) and of the data-error statistics in the resulting vs parameter uncertainty estimates. The assumed earth model parametrization influences estimates of parameter values and uncertainties due to different parametrizations leading to different ranges of data predictions. The support of the data for a particular model is often non-unique and several parametrizations may be supported. A trans-dimensional formulation accounts for this non-uniqueness by including a model-indexing parameter as an unknown so that groups of models (identified by the indexing parameter) are considered in the results. The earth model is parametrized in terms of a partition model with interfaces given over a depth-range of interest. In this work, the number of interfaces (layers) in the partition model represents the trans-dimensional model indexing. In addition, serial data-error correlations are addressed by augmenting the geophysical forward model with a hierarchical autoregressive error model that can account for a wide range of error processes with a small number of parameters. Hence, the limited knowledge about the true statistical distribution of data errors is also accounted for in the earth model parameter estimates, resulting in more realistic uncertainties and parameter values. Hierarchical autoregressive error models do not rely on point estimates of the model vector to estimate data-error statistics, and have no requirement for computing the inverse or determinant of a data-error covariance matrix. This approach is particularly useful for trans-dimensional inverse problems, as point estimates may not be representative of the state space that spans multiple subspaces of different dimensionalities. The order of the autoregressive process required to fit the data is determined here by posterior residual-sample examination and statistical tests. Inference for earth model parameters is carried out on the trans-dimensional posterior probability distribution by considering ensembles of parameter vectors. In particular, vs uncertainty estimates are obtained by marginalizing the trans-dimensional posterior distribution in terms of vs-profile marginal distributions. The methodology is applied to microtremor array dispersion data collected at two sites with significantly different geology in British Columbia, Canada. At both sites, results show excellent agreement with estimates from invasive measurements.
Errors in weight estimation in the emergency department: comparing performance by providers and patients.

PubMed

Hall, William L; Larkin, Gregory L; Trujillo, Mauricio J; Hinds, Jackie L; Delaney, Kathleen A

2004-10-01

To examine biases in weight estimation by Emergency Department (ED) providers and patients, a convenience sample of ED providers (faculty, residents, interns, nurses, medical students, paramedics) and patients was studied. Providers (n = 33), blinded to study hypothesis and patient data, estimated their own weight as well as the weight of 11-20 patients each. An independent sample of patients (n = 95) was used to assess biases in patients' estimation of their own weight. Data are represented as over, under, or within +/- 5 kg, the dose tolerance standard for thrombolytics. Logistic regression analysis revealed that patients are almost nine times more likely to accurately estimate their own weight than providers; yet 22% of patients were unable to estimate their own weight within 5 kg. Of all providers, paramedics were significantly worse estimators of patient weight than other providers. Providers were no better at guessing their own weight than were patients. Though there was no systematic estimate bias by weight, experience level (except paramedic), or gender for providers, those providers under 30 years of age were significantly better estimators of patient weight than older providers. Although patient gender did not create a bias in provider estimation accuracy, providers were more likely to underestimate women's weights than men's. In conclusion, patient self-estimates of weight are significantly better than estimates by providers. Inaccurate estimates by both groups could potentially contribute to medication dosing errors in the ED.
Non-linear matter power spectrum covariance matrix errors and cosmological parameter uncertainties

NASA Astrophysics Data System (ADS)

Blot, L.; Corasaniti, P. S.; Amendola, L.; Kitching, T. D.

2016-06-01

The covariance of the matter power spectrum is a key element of the analysis of galaxy clustering data. Independent realizations of observational measurements can be used to sample the covariance, nevertheless statistical sampling errors will propagate into the cosmological parameter inference potentially limiting the capabilities of the upcoming generation of galaxy surveys. The impact of these errors as function of the number of realizations has been previously evaluated for Gaussian distributed data. However, non-linearities in the late-time clustering of matter cause departures from Gaussian statistics. Here, we address the impact of non-Gaussian errors on the sample covariance and precision matrix errors using a large ensemble of N-body simulations. In the range of modes where finite volume effects are negligible (0.1 ≲ k [h Mpc-1] ≲ 1.2), we find deviations of the variance of the sample covariance with respect to Gaussian predictions above ˜10 per cent at k > 0.3 h Mpc-1. Over the entire range these reduce to about ˜5 per cent for the precision matrix. Finally, we perform a Fisher analysis to estimate the effect of covariance errors on the cosmological parameter constraints. In particular, assuming Euclid-like survey characteristics we find that a number of independent realizations larger than 5000 is necessary to reduce the contribution of sampling errors to the cosmological parameter uncertainties at subpercent level. We also show that restricting the analysis to large scales k ≲ 0.2 h Mpc-1 results in a considerable loss in constraining power, while using the linear covariance to include smaller scales leads to an underestimation of the errors on the cosmological parameters.
Error Distribution Evaluation of the Third Vanishing Point Based on Random Statistical Simulation

NASA Astrophysics Data System (ADS)

Li, C.

2012-07-01

POS, integrated by GPS / INS (Inertial Navigation Systems), has allowed rapid and accurate determination of position and attitude of remote sensing equipment for MMS (Mobile Mapping Systems). However, not only does INS have system error, but also it is very expensive. Therefore, in this paper error distributions of vanishing points are studied and tested in order to substitute INS for MMS in some special land-based scene, such as ground façade where usually only two vanishing points can be detected. Thus, the traditional calibration approach based on three orthogonal vanishing points is being challenged. In this article, firstly, the line clusters, which parallel to each others in object space and correspond to the vanishing points, are detected based on RANSAC (Random Sample Consensus) and parallelism geometric constraint. Secondly, condition adjustment with parameters is utilized to estimate nonlinear error equations of two vanishing points (VX, VY). How to set initial weights for the adjustment solution of single image vanishing points is presented. Solving vanishing points and estimating their error distributions base on iteration method with variable weights, co-factor matrix and error ellipse theory. Thirdly, under the condition of known error ellipses of two vanishing points (VX, VY) and on the basis of the triangle geometric relationship of three vanishing points, the error distribution of the third vanishing point (VZ) is calculated and evaluated by random statistical simulation with ignoring camera distortion. Moreover, Monte Carlo methods utilized for random statistical estimation are presented. Finally, experimental results of vanishing points coordinate and their error distributions are shown and analyzed.
Creel survey sampling designs for estimating effort in short-duration Chinook salmon fisheries

USGS Publications Warehouse

McCormick, Joshua L.; Quist, Michael C.; Schill, Daniel J.

2013-01-01

Chinook Salmon Oncorhynchus tshawytscha sport fisheries in the Columbia River basin are commonly monitored using roving creel survey designs and require precise, unbiased catch estimates. The objective of this study was to examine the relative bias and precision of total catch estimates using various sampling designs to estimate angling effort under the assumption that mean catch rate was known. We obtained information on angling populations based on direct visual observations of portions of Chinook Salmon fisheries in three Idaho river systems over a 23-d period. Based on the angling population, Monte Carlo simulations were used to evaluate the properties of effort and catch estimates for each sampling design. All sampling designs evaluated were relatively unbiased. Systematic random sampling (SYS) resulted in the most precise estimates. The SYS and simple random sampling designs had mean square error (MSE) estimates that were generally half of those observed with cluster sampling designs. The SYS design was more efficient (i.e., higher accuracy per unit cost) than a two-cluster design. Increasing the number of clusters available for sampling within a day decreased the MSE of estimates of daily angling effort, but the MSE of total catch estimates was variable depending on the fishery. The results of our simulations provide guidelines on the relative influence of sample sizes and sampling designs on parameters of interest in short-duration Chinook Salmon fisheries.
Improving tree age estimates derived from increment cores: a case study of red pine

Treesearch

Shawn Fraver; John B. Bradford; Brian J. Palik

2011-01-01

Accurate tree ages are critical to a range of forestry and ecological studies. However, ring counts from increment cores, if not corrected for the years between the root collar and coring height, can produce sizeable age errors. The magnitude of errors is influenced by both the height at which the core is extracted and the growth rate. We destructively sampled saplings...

Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daly, Don S.; Anderson, Kevin K.; White, Amanda M.

Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensitymore » that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting, and reliably estimates believable prediction errors. For the 50% of the real data sets fit well by both methods, spline and logistic predictions are practically indistinguishable, varying in accuracy by less than 15%. The spline method may be useful when automated prediction across simultaneous assays of numerous proteins must be applied routinely with minimal user intervention.« less
Satellite inventory of Minnesota forest resources

NASA Technical Reports Server (NTRS)

Bauer, Marvin E.; Burk, Thomas E.; Ek, Alan R.; Coppin, Pol R.; Lime, Stephen D.; Walsh, Terese A.; Walters, David K.; Befort, William; Heinzen, David F.

1993-01-01

The methods and results of using Landsat Thematic Mapper (TM) data to classify and estimate the acreage of forest covertypes in northeastern Minnesota are described. Portions of six TM scenes covering five counties with a total area of 14,679 square miles were classified into six forest and five nonforest classes. The approach involved the integration of cluster sampling, image processing, and estimation. Using cluster sampling, 343 plots, each 88 acres in size, were photo interpreted and field mapped as a source of reference data for classifier training and calibration of the TM data classifications. Classification accuracies of up to 75 percent were achieved; most misclassification was between similar or related classes. An inverse method of calibration, based on the error rates obtained from the classifications of the cluster plots, was used to adjust the classification class proportions for classification errors. The resulting area estimates for total forest land in the five-county area were within 3 percent of the estimate made independently by the USDA Forest Service. Area estimates for conifer and hardwood forest types were within 0.8 and 6.0 percent respectively, of the Forest Service estimates. A trial of a second method of estimating the same classes as the Forest Service resulted in standard errors of 0.002 to 0.015. A study of the use of multidate TM data for change detection showed that forest canopy depletion, canopy increment, and no change could be identified with greater than 90 percent accuracy. The project results have been the basis for the Minnesota Department of Natural Resources and the Forest Service to define and begin to implement an annual system of forest inventory which utilizes Landsat TM data to detect changes in forest cover.
Why GPS makes distances bigger than they are

PubMed Central

Ranacher, Peter; Brunauer, Richard; Trutschnig, Wolfgang; Van der Spek, Stefan; Reich, Siegfried

2016-01-01

ABSTRACT Global navigation satellite systems such as the Global Positioning System (GPS) is one of the most important sensors for movement analysis. GPS is widely used to record the trajectories of vehicles, animals and human beings. However, all GPS movement data are affected by both measurement and interpolation errors. In this article we show that measurement error causes a systematic bias in distances recorded with a GPS; the distance between two points recorded with a GPS is – on average – bigger than the true distance between these points. This systematic ‘overestimation of distance’ becomes relevant if the influence of interpolation error can be neglected, which in practice is the case for movement sampled at high frequencies. We provide a mathematical explanation of this phenomenon and illustrate that it functionally depends on the autocorrelation of GPS measurement error (C). We argue that C can be interpreted as a quality measure for movement data recorded with a GPS. If there is a strong autocorrelation between any two consecutive position estimates, they have very similar error. This error cancels out when average speed, distance or direction is calculated along the trajectory. Based on our theoretical findings we introduce a novel approach to determine C in real-world GPS movement data sampled at high frequencies. We apply our approach to pedestrian trajectories and car trajectories. We found that the measurement error in the data was strongly spatially and temporally autocorrelated and give a quality estimate of the data. Most importantly, our findings are not limited to GPS alone. The systematic bias and its implications are bound to occur in any movement data collected with absolute positioning if interpolation error can be neglected. PMID:27019610
An optimized network for phosphorus load monitoring for Lake Okeechobee, Florida

USGS Publications Warehouse

Gain, W.S.

1997-01-01

Phosphorus load data were evaluated for Lake Okeechobee, Florida, for water years 1982 through 1991. Standard errors for load estimates were computed from available phosphorus concentration and daily discharge data. Components of error were associated with uncertainty in concentration and discharge data and were calculated for existing conditions and for 6 alternative load-monitoring scenarios for each of 48 distinct inflows. Benefit-cost ratios were computed for each alternative monitoring scenario at each site by dividing estimated reductions in load uncertainty by the 5-year average costs of each scenario in 1992 dollars. Absolute and marginal benefit-cost ratios were compared in an iterative optimization scheme to determine the most cost-effective combination of discharge and concentration monitoring scenarios for the lake. If the current (1992) discharge-monitoring network around the lake is maintained, the water-quality sampling at each inflow site twice each year is continued, and the nature of loading remains the same, the standard error of computed mean-annual load is estimated at about 98 metric tons per year compared to an absolute loading rate (inflows and outflows) of 530 metric tons per year. This produces a relative uncertainty of nearly 20 percent. The standard error in load can be reduced to about 20 metric tons per year (4 percent) by adopting an optimized set of monitoring alternatives at a cost of an additional $200,000 per year. The final optimized network prescribes changes to improve both concentration and discharge monitoring. These changes include the addition of intensive sampling with automatic samplers at 11 sites, the initiation of event-based sampling by observers at another 5 sites, the continuation of periodic sampling 12 times per year at 1 site, the installation of acoustic velocity meters to improve discharge gaging at 9 sites, and the improvement of a discharge rating at 1 site.
Estimating instream constituent loads using replicate synoptic sampling, Peru Creek, Colorado

NASA Astrophysics Data System (ADS)

Runkel, Robert L.; Walton-Day, Katherine; Kimball, Briant A.; Verplanck, Philip L.; Nimick, David A.

2013-05-01

SummaryThe synoptic mass balance approach is often used to evaluate constituent mass loading in streams affected by mine drainage. Spatial profiles of constituent mass load are used to identify sources of contamination and prioritize sites for remedial action. This paper presents a field scale study in which replicate synoptic sampling campaigns are used to quantify the aggregate uncertainty in constituent load that arises from (1) laboratory analyses of constituent and tracer concentrations, (2) field sampling error, and (3) temporal variation in concentration from diel constituent cycles and/or source variation. Consideration of these factors represents an advance in the application of the synoptic mass balance approach by placing error bars on estimates of constituent load and by allowing all sources of uncertainty to be quantified in aggregate; previous applications of the approach have provided only point estimates of constituent load and considered only a subset of the possible errors. Given estimates of aggregate uncertainty, site specific data and expert judgement may be used to qualitatively assess the contributions of individual factors to uncertainty. This assessment can be used to guide the collection of additional data to reduce uncertainty. Further, error bars provided by the replicate approach can aid the investigator in the interpretation of spatial loading profiles and the subsequent identification of constituent source areas within the watershed. The replicate sampling approach is applied to Peru Creek, a stream receiving acidic, metal-rich effluent from the Pennsylvania Mine. Other sources of acidity and metals within the study reach include a wetland area adjacent to the mine and tributary inflow from Cinnamon Gulch. Analysis of data collected under low-flow conditions indicates that concentrations of Al, Cd, Cu, Fe, Mn, Pb, and Zn in Peru Creek exceed aquatic life standards. Constituent loading within the study reach is dominated by effluent from the Pennsylvania Mine, with over 50% of the Cd, Cu, Fe, Mn, and Zn loads attributable to a collapsed adit near the top of the study reach. These estimates of mass load may underestimate the effect of the Pennsylvania Mine as leakage from underground mine workings may contribute to metal loads that are currently attributed to the wetland area. This potential leakage confounds the evaluation of remedial options and additional research is needed to determine the magnitude and location of the leakage.
Estimating instream constituent loads using replicate synoptic sampling, Peru Creek, Colorado

USGS Publications Warehouse

Runkel, Robert L.; Walton-Day, Katherine; Kimball, Briant A.; Verplanck, Philip L.; Nimick, David A.

2013-01-01

The synoptic mass balance approach is often used to evaluate constituent mass loading in streams affected by mine drainage. Spatial profiles of constituent mass load are used to identify sources of contamination and prioritize sites for remedial action. This paper presents a field scale study in which replicate synoptic sampling campaigns are used to quantify the aggregate uncertainty in constituent load that arises from (1) laboratory analyses of constituent and tracer concentrations, (2) field sampling error, and (3) temporal variation in concentration from diel constituent cycles and/or source variation. Consideration of these factors represents an advance in the application of the synoptic mass balance approach by placing error bars on estimates of constituent load and by allowing all sources of uncertainty to be quantified in aggregate; previous applications of the approach have provided only point estimates of constituent load and considered only a subset of the possible errors. Given estimates of aggregate uncertainty, site specific data and expert judgement may be used to qualitatively assess the contributions of individual factors to uncertainty. This assessment can be used to guide the collection of additional data to reduce uncertainty. Further, error bars provided by the replicate approach can aid the investigator in the interpretation of spatial loading profiles and the subsequent identification of constituent source areas within the watershed.The replicate sampling approach is applied to Peru Creek, a stream receiving acidic, metal-rich effluent from the Pennsylvania Mine. Other sources of acidity and metals within the study reach include a wetland area adjacent to the mine and tributary inflow from Cinnamon Gulch. Analysis of data collected under low-flow conditions indicates that concentrations of Al, Cd, Cu, Fe, Mn, Pb, and Zn in Peru Creek exceed aquatic life standards. Constituent loading within the study reach is dominated by effluent from the Pennsylvania Mine, with over 50% of the Cd, Cu, Fe, Mn, and Zn loads attributable to a collapsed adit near the top of the study reach. These estimates of mass load may underestimate the effect of the Pennsylvania Mine as leakage from underground mine workings may contribute to metal loads that are currently attributed to the wetland area. This potential leakage confounds the evaluation of remedial options and additional research is needed to determine the magnitude and location of the leakage.
Reducing errors in aircraft atmospheric inversion estimates of point-source emissions: the Aliso Canyon natural gas leak as a natural tracer experiment

NASA Astrophysics Data System (ADS)

Gourdji, S. M.; Yadav, V.; Karion, A.; Mueller, K. L.; Conley, S.; Ryerson, T.; Nehrkorn, T.; Kort, E. A.

2018-04-01

Urban greenhouse gas (GHG) flux estimation with atmospheric measurements and modeling, i.e. the ‘top-down’ approach, can potentially support GHG emission reduction policies by assessing trends in surface fluxes and detecting anomalies from bottom-up inventories. Aircraft-collected GHG observations also have the potential to help quantify point-source emissions that may not be adequately sampled by fixed surface tower-based atmospheric observing systems. Here, we estimate CH4 emissions from a known point source, the Aliso Canyon natural gas leak in Los Angeles, CA from October 2015–February 2016, using atmospheric inverse models with airborne CH4 observations from twelve flights ≈4 km downwind of the leak and surface sensitivities from a mesoscale atmospheric transport model. This leak event has been well-quantified previously using various methods by the California Air Resources Board, thereby providing high confidence in the mass-balance leak rate estimates of (Conley et al 2016), used here for comparison to inversion results. Inversions with an optimal setup are shown to provide estimates of the leak magnitude, on average, within a third of the mass balance values, with remaining errors in estimated leak rates predominantly explained by modeled wind speed errors of up to 10 m s‑1, quantified by comparing airborne meteorological observations with modeled values along the flight track. An inversion setup using scaled observational wind speed errors in the model-data mismatch covariance matrix is shown to significantly reduce the influence of transport model errors on spatial patterns and estimated leak rates from the inversions. In sum, this study takes advantage of a natural tracer release experiment (i.e. the Aliso Canyon natural gas leak) to identify effective approaches for reducing the influence of transport model error on atmospheric inversions of point-source emissions, while suggesting future potential for integrating surface tower and aircraft atmospheric GHG observations in top-down urban emission monitoring systems.
On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models.

PubMed

Magis, David

2014-11-01

In item response theory, the classical estimators of ability are highly sensitive to response disturbances and can return strongly biased estimates of the true underlying ability level. Robust methods were introduced to lessen the impact of such aberrant responses on the estimation process. The computation of asymptotic (i.e., large-sample) standard errors (ASE) for these robust estimators, however, has not yet been fully considered. This paper focuses on a broad class of robust ability estimators, defined by an appropriate selection of the weight function and the residual measure, for which the ASE is derived from the theory of estimating equations. The maximum likelihood (ML) and the robust estimators, together with their estimated ASEs, are then compared in a simulation study by generating random guessing disturbances. It is concluded that both the estimators and their ASE perform similarly in the absence of random guessing, while the robust estimator and its estimated ASE are less biased and outperform their ML counterparts in the presence of random guessing with large impact on the item response process. © 2013 The British Psychological Society.
How many stakes are required to measure the mass balance of a glacier?

USGS Publications Warehouse

Fountain, A.G.; Vecchia, A.

1999-01-01

Glacier mass balance is estimated for South Cascade Glacier and Maclure Glacier using a one-dimensional regression of mass balance with altitude as an alternative to the traditional approach of contouring mass balance values. One attractive feature of regression is that it can be applied to sparse data sets where contouring is not possible and can provide an objective error of the resulting estimate. Regression methods yielded mass balance values equivalent to contouring methods. The effect of the number of mass balance measurements on the final value for the glacier showed that sample sizes as small as five stakes provided reasonable estimates, although the error estimates were greater than for larger sample sizes. Different spatial patterns of measurement locations showed no appreciable influence on the final value as long as different surface altitudes were intermittently sampled over the altitude range of the glacier. Two different regression equations were examined, a quadratic, and a piecewise linear spline, and comparison of results showed little sensitivity to the type of equation. These results point to the dominant effect of the gradient of mass balance with altitude of alpine glaciers compared to transverse variations. The number of mass balance measurements required to determine the glacier balance appears to be scale invariant for small glaciers and five to ten stakes are sufficient.
Accuracy assessment in the Large Area Crop Inventory Experiment

NASA Technical Reports Server (NTRS)

Houston, A. G.; Pitts, D. E.; Feiveson, A. H.; Badhwar, G.; Ferguson, M.; Hsu, E.; Potter, J.; Chhikara, R.; Rader, M.; Ahlers, C.

1979-01-01

The Accuracy Assessment System (AAS) of the Large Area Crop Inventory Experiment (LACIE) was responsible for determining the accuracy and reliability of LACIE estimates of wheat production, area, and yield, made at regular intervals throughout the crop season, and for investigating the various LACIE error sources, quantifying these errors, and relating them to their causes. Some results of using the AAS during the three years of LACIE are reviewed. As the program culminated, AAS was able not only to meet the goal of obtaining accurate statistical estimates of sampling and classification accuracy, but also the goal of evaluating component labeling errors. Furthermore, the ground-truth data processing matured from collecting data for one crop (small grains) to collecting, quality-checking, and archiving data for all crops in a LACIE small segment.
Sample sizes to control error estimates in determining soil bulk density in California forest soils

Treesearch

Youzhi Han; Jianwei Zhang; Kim G. Mattson; Weidong Zhang; Thomas A. Weber

2016-01-01

Characterizing forest soil properties with high variability is challenging, sometimes requiring large numbers of soil samples. Soil bulk density is a standard variable needed along with element concentrations to calculate nutrient pools. This study aimed to determine the optimal sample size, the number of observation (n), for predicting the soil bulk density with a...
Tests of Independence in Contingency Tables with Small Samples: A Comparison of Statistical Power.

ERIC Educational Resources Information Center

Parshall, Cynthia G.; Kromrey, Jeffrey D.

1996-01-01

Power and Type I error rates were estimated for contingency tables with small sample sizes for the following four types of tests: (1) Pearson's chi-square; (2) chi-square with Yates's continuity correction; (3) the likelihood ratio test; and (4) Fisher's Exact Test. Various marginal distributions, sample sizes, and effect sizes were examined. (SLD)
Statistical properties of Fourier-based time-lag estimates

NASA Astrophysics Data System (ADS)

Epitropakis, A.; Papadakis, I. E.

2016-06-01

Context. The study of X-ray time-lag spectra in active galactic nuclei (AGN) is currently an active research area, since it has the potential to illuminate the physics and geometry of the innermost region (I.e. close to the putative super-massive black hole) in these objects. To obtain reliable information from these studies, the statistical properties of time-lags estimated from data must be known as accurately as possible. Aims: We investigated the statistical properties of Fourier-based time-lag estimates (I.e. based on the cross-periodogram), using evenly sampled time series with no missing points. Our aim is to provide practical "guidelines" on estimating time-lags that are minimally biased (I.e. whose mean is close to their intrinsic value) and have known errors. Methods: Our investigation is based on both analytical work and extensive numerical simulations. The latter consisted of generating artificial time series with various signal-to-noise ratios and sampling patterns/durations similar to those offered by AGN observations with present and past X-ray satellites. We also considered a range of different model time-lag spectra commonly assumed in X-ray analyses of compact accreting systems. Results: Discrete sampling, binning and finite light curve duration cause the mean of the time-lag estimates to have a smaller magnitude than their intrinsic values. Smoothing (I.e. binning over consecutive frequencies) of the cross-periodogram can add extra bias at low frequencies. The use of light curves with low signal-to-noise ratio reduces the intrinsic coherence, and can introduce a bias to the sample coherence, time-lag estimates, and their predicted error. Conclusions: Our results have direct implications for X-ray time-lag studies in AGN, but can also be applied to similar studies in other research fields. We find that: a) time-lags should be estimated at frequencies lower than ≈ 1/2 the Nyquist frequency to minimise the effects of discrete binning of the observed time series; b) smoothing of the cross-periodogram should be avoided, as this may introduce significant bias to the time-lag estimates, which can be taken into account by assuming a model cross-spectrum (and not just a model time-lag spectrum); c) time-lags should be estimated by dividing observed time series into a number, say m, of shorter data segments and averaging the resulting cross-periodograms; d) if the data segments have a duration ≳ 20 ks, the time-lag bias is ≲15% of its intrinsic value for the model cross-spectra and power-spectra considered in this work. This bias should be estimated in practise (by considering possible intrinsic cross-spectra that may be applicable to the time-lag spectra at hand) to assess the reliability of any time-lag analysis; e) the effects of experimental noise can be minimised by only estimating time-lags in the frequency range where the sample coherence is larger than 1.2/(1 + 0.2m). In this range, the amplitude of noise variations caused by measurement errors is smaller than the amplitude of the signal's intrinsic variations. As long as m ≳ 20, time-lags estimated by averaging over individual data segments have analytical error estimates that are within 95% of the true scatter around their mean, and their distribution is similar, albeit not identical, to a Gaussian.
Combined Uncertainty and A-Posteriori Error Bound Estimates for General CFD Calculations: Theory and Software Implementation

NASA Technical Reports Server (NTRS)

Barth, Timothy J.

2014-01-01

This workshop presentation discusses the design and implementation of numerical methods for the quantification of statistical uncertainty, including a-posteriori error bounds, for output quantities computed using CFD methods. Hydrodynamic realizations often contain numerical error arising from finite-dimensional approximation (e.g. numerical methods using grids, basis functions, particles) and statistical uncertainty arising from incomplete information and/or statistical characterization of model parameters and random fields. The first task at hand is to derive formal error bounds for statistics given realizations containing finite-dimensional numerical error [1]. The error in computed output statistics contains contributions from both realization error and the error resulting from the calculation of statistics integrals using a numerical method. A second task is to devise computable a-posteriori error bounds by numerically approximating all terms arising in the error bound estimates. For the same reason that CFD calculations including error bounds but omitting uncertainty modeling are only of limited value, CFD calculations including uncertainty modeling but omitting error bounds are only of limited value. To gain maximum value from CFD calculations, a general software package for uncertainty quantification with quantified error bounds has been developed at NASA. The package provides implementations for a suite of numerical methods used in uncertainty quantification: Dense tensorization basis methods [3] and a subscale recovery variant [1] for non-smooth data, Sparse tensorization methods[2] utilizing node-nested hierarchies, Sampling methods[4] for high-dimensional random variable spaces.
Mixtures of Berkson and classical covariate measurement error in the linear mixed model: Bias analysis and application to a study on ultrafine particles.

PubMed

Deffner, Veronika; Küchenhoff, Helmut; Breitner, Susanne; Schneider, Alexandra; Cyrys, Josef; Peters, Annette

2018-05-01

The ultrafine particle measurements in the Augsburger Umweltstudie, a panel study conducted in Augsburg, Germany, exhibit measurement error from various sources. Measurements of mobile devices show classical possibly individual-specific measurement error; Berkson-type error, which may also vary individually, occurs, if measurements of fixed monitoring stations are used. The combination of fixed site and individual exposure measurements results in a mixture of the two error types. We extended existing bias analysis approaches to linear mixed models with a complex error structure including individual-specific error components, autocorrelated errors, and a mixture of classical and Berkson error. Theoretical considerations and simulation results show, that autocorrelation may severely change the attenuation of the effect estimations. Furthermore, unbalanced designs and the inclusion of confounding variables influence the degree of attenuation. Bias correction with the method of moments using data with mixture measurement error partially yielded better results compared to the usage of incomplete data with classical error. Confidence intervals (CIs) based on the delta method achieved better coverage probabilities than those based on Bootstrap samples. Moreover, we present the application of these new methods to heart rate measurements within the Augsburger Umweltstudie: the corrected effect estimates were slightly higher than their naive equivalents. The substantial measurement error of ultrafine particle measurements has little impact on the results. The developed methodology is generally applicable to longitudinal data with measurement error. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Avoiding treatment bias of REDD+ monitoring by sampling with partial replacement.

PubMed

Köhl, Michael; Scott, Charles T; Lister, Andrew J; Demon, Inez; Plugge, Daniel

2015-12-01

Implementing REDD+ renders the development of a measurement, reporting and verification (MRV) system necessary to monitor carbon stock changes. MRV systems generally apply a combination of remote sensing techniques and in-situ field assessments. In-situ assessments can be based on 1) permanent plots, which are assessed on all successive occasions, 2) temporary plots, which are assessed only once, and 3) a combination of both. The current study focuses on in-situ assessments and addresses the effect of treatment bias, which is introduced by managing permanent sampling plots differently than the surrounding forests. Temporary plots are not subject to treatment bias, but are associated with large sampling errors and low cost-efficiency. Sampling with partial replacement (SPR) utilizes both permanent and temporary plots. We apply a scenario analysis with different intensities of deforestation and forest degradation to show that SPR combines cost-efficiency with the handling of treatment bias. Without treatment bias permanent plots generally provide lower sampling errors for change estimates than SPR and temporary plots, but do not provide reliable estimates, if treatment bias occurs, SPR allows for change estimates that are comparable to those provided by permanent plots, offers the flexibility to adjust sample sizes in the course of time, and allows to compare data on permanent versus temporary plots for detecting treatment bias. Equivalence of biomass or carbon stock estimates between permanent and temporary plots serves as an indication for the absence of treatment bias while differences suggest that there is evidence for treatment bias. SPR is a flexible tool for estimating emission factors from successive measurements. It does not entirely depend on sample plots that are installed at the first occasion but allows for the adjustment of sample sizes and placement of new plots at any occasion. This ensures that in-situ samples provide representative estimates over time. SPR offers the possibility to increase sampling intensity in areas with high degradation intensities or to establish new plots in areas where permanent plots are lost due to deforestation. SPR is also an ideal approach to mitigate concerns about treatment bias.
Correcting intensity loss errors in the absence of texture-free reference samples during pole figure measurement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saleh, Ahmed A., E-mail: asaleh@uow.edu.au

Even with the use of X-ray polycapillary lenses, sample tilting during pole figure measurement results in a decrease in the recorded X-ray intensity. The magnitude of this error is affected by the sample size and/or the finite detector size. These errors can be typically corrected by measuring the intensity loss as a function of the tilt angle using a texture-free reference sample (ideally made of the same alloy as the investigated material). Since texture-free reference samples are not readily available for all alloys, the present study employs an empirical procedure to estimate the correction curve for a particular experimental configuration.more » It involves the use of real texture-free reference samples that pre-exist in any X-ray diffraction laboratory to first establish the empirical correlations between X-ray intensity, sample tilt and their Bragg angles and thereafter generate correction curves for any Bragg angle. It will be shown that the empirically corrected textures are in very good agreement with the experimentally corrected ones. - Highlights: •Sample tilting during X-ray pole figure measurement leads to intensity loss errors. •Texture-free reference samples are typically used to correct the pole figures. •An empirical correction procedure is proposed in the absence of reference samples. •The procedure relies on reference samples that pre-exist in any texture laboratory. •Experimentally and empirically corrected textures are in very good agreement.« less
Improving the analysis of composite endpoints in rare disease trials.

PubMed

McMenamin, Martina; Berglind, Anna; Wason, James M S

2018-05-22

Composite endpoints are recommended in rare diseases to increase power and/or to sufficiently capture complexity. Often, they are in the form of responder indices which contain a mixture of continuous and binary components. Analyses of these outcomes typically treat them as binary, thus only using the dichotomisations of continuous components. The augmented binary method offers a more efficient alternative and is therefore especially useful for rare diseases. Previous work has indicated the method may have poorer statistical properties when the sample size is small. Here we investigate small sample properties and implement small sample corrections. We re-sample from a previous trial with sample sizes varying from 30 to 80. We apply the standard binary and augmented binary methods and determine the power, type I error rate, coverage and average confidence interval width for each of the estimators. We implement Firth's adjustment for the binary component models and a small sample variance correction for the generalized estimating equations, applying the small sample adjusted methods to each sub-sample as before for comparison. For the log-odds treatment effect the power of the augmented binary method is 20-55% compared to 12-20% for the standard binary method. Both methods have approximately nominal type I error rates. The difference in response probabilities exhibit similar power but both unadjusted methods demonstrate type I error rates of 6-8%. The small sample corrected methods have approximately nominal type I error rates. On both scales, the reduction in average confidence interval width when using the adjusted augmented binary method is 17-18%. This is equivalent to requiring a 32% smaller sample size to achieve the same statistical power. The augmented binary method with small sample corrections provides a substantial improvement for rare disease trials using composite endpoints. We recommend the use of the method for the primary analysis in relevant rare disease trials. We emphasise that the method should be used alongside other efforts in improving the quality of evidence generated from rare disease trials rather than replace them.
Do missing data influence the accuracy of divergence-time estimation with BEAST?

PubMed

Zheng, Yuchi; Wiens, John J

2015-04-01

Time-calibrated phylogenies have become essential to evolutionary biology. A recurrent and unresolved question for dating analyses is whether genes with missing data cells should be included or excluded. This issue is particularly unclear for the most widely used dating method, the uncorrelated lognormal approach implemented in BEAST. Here, we test the robustness of this method to missing data. We compare divergence-time estimates from a nearly complete dataset (20 nuclear genes for 32 species of squamate reptiles) to those from subsampled matrices, including those with 5 or 2 complete loci only and those with 5 or 8 incomplete loci added. In general, missing data had little impact on estimated dates (mean error of ∼5Myr per node or less, given an overall age of ∼220Myr in squamates), even when 80% of sampled genes had 75% missing data. Mean errors were somewhat higher when all genes were 75% incomplete (∼17Myr). However, errors increased dramatically when only 2 of 9 fossil calibration points were included (∼40Myr), regardless of missing data. Overall, missing data (and even numbers of genes sampled) may have only minor impacts on the accuracy of divergence dating with BEAST, relative to the dramatic effects of fossil calibrations. Copyright © 2015 Elsevier Inc. All rights reserved.
Population viability analysis with species occurrence data from museum collections.

PubMed

Skarpaas, Olav; Stabbetorp, Odd E

2011-06-01

The most comprehensive data on many species come from scientific collections. Thus, we developed a method of population viability analysis (PVA) in which this type of occurrence data can be used. In contrast to classical PVA, our approach accounts for the inherent observation error in occurrence data and allows the estimation of the population parameters needed for viability analysis. We tested the sensitivity of the approach to spatial resolution of the data, length of the time series, sampling effort, and detection probability with simulated data and conducted PVAs for common, rare, and threatened species. We compared the results of these PVAs with results of standard method PVAs in which observation error is ignored. Our method provided realistic estimates of population growth terms and quasi-extinction risk in cases in which the standard method without observation error could not. For low values of any of the sampling variables we tested, precision decreased, and in some cases biased estimates resulted. The results of our PVAs with the example species were consistent with information in the literature on these species. Our approach may facilitate PVA for a wide range of species of conservation concern for which demographic data are lacking but occurrence data are readily available. ©2011 Society for Conservation Biology.

Functional Mixed Effects Model for Small Area Estimation.

PubMed

Maiti, Tapabrata; Sinha, Samiran; Zhong, Ping-Shou

2016-09-01

Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.
Improvements to photometry. Part 1: Better estimation of derivatives in extinction and transformation equations

NASA Technical Reports Server (NTRS)

Young, Andrew T.

1988-01-01

Atmospheric extinction in wideband photometry is examined both analytically and through numerical simulations. If the derivatives that appear in the Stromgren-King theory are estimated carefully, it appears that wideband measurements can be transformed to outside the atmosphere with errors no greater than a millimagnitude. A numerical analysis approach is used to estimate derivatives of both the stellar and atmospheric extinction spectra, avoiding previous assumptions that the extinction follows a power law. However, it is essential to satify the requirements of the sampling theorem to keep aliasing errors small. Typically, this means that band separations cannot exceed half of the full width at half-peak response. Further work is needed to examine higher order effects, which may well be significant.
Procrustes-based geometric morphometrics on MRI images: An example of inter-operator bias in 3D landmarks and its impact on big datasets.

PubMed

Daboul, Amro; Ivanovska, Tatyana; Bülow, Robin; Biffar, Reiner; Cardini, Andrea

2018-01-01

Using 3D anatomical landmarks from adult human head MRIs, we assessed the magnitude of inter-operator differences in Procrustes-based geometric morphometric analyses. An in depth analysis of both absolute and relative error was performed in a subsample of individuals with replicated digitization by three different operators. The effect of inter-operator differences was also explored in a large sample of more than 900 individuals. Although absolute error was not unusual for MRI measurements, including bone landmarks, shape was particularly affected by differences among operators, with up to more than 30% of sample variation accounted for by this type of error. The magnitude of the bias was such that it dominated the main pattern of bone and total (all landmarks included) shape variation, largely surpassing the effect of sex differences between hundreds of men and women. In contrast, however, we found higher reproducibility in soft-tissue nasal landmarks, despite relatively larger errors in estimates of nasal size. Our study exemplifies the assessment of measurement error using geometric morphometrics on landmarks from MRIs and stresses the importance of relating it to total sample variance within the specific methodological framework being used. In summary, precise landmarks may not necessarily imply negligible errors, especially in shape data; indeed, size and shape may be differentially impacted by measurement error and different types of landmarks may have relatively larger or smaller errors. Importantly, and consistently with other recent studies using geometric morphometrics on digital images (which, however, were not specific to MRI data), this study showed that inter-operator biases can be a major source of error in the analysis of large samples, as those that are becoming increasingly common in the 'era of big data'.
Procrustes-based geometric morphometrics on MRI images: An example of inter-operator bias in 3D landmarks and its impact on big datasets

PubMed Central

Ivanovska, Tatyana; Bülow, Robin; Biffar, Reiner; Cardini, Andrea

2018-01-01

Using 3D anatomical landmarks from adult human head MRIs, we assessed the magnitude of inter-operator differences in Procrustes-based geometric morphometric analyses. An in depth analysis of both absolute and relative error was performed in a subsample of individuals with replicated digitization by three different operators. The effect of inter-operator differences was also explored in a large sample of more than 900 individuals. Although absolute error was not unusual for MRI measurements, including bone landmarks, shape was particularly affected by differences among operators, with up to more than 30% of sample variation accounted for by this type of error. The magnitude of the bias was such that it dominated the main pattern of bone and total (all landmarks included) shape variation, largely surpassing the effect of sex differences between hundreds of men and women. In contrast, however, we found higher reproducibility in soft-tissue nasal landmarks, despite relatively larger errors in estimates of nasal size. Our study exemplifies the assessment of measurement error using geometric morphometrics on landmarks from MRIs and stresses the importance of relating it to total sample variance within the specific methodological framework being used. In summary, precise landmarks may not necessarily imply negligible errors, especially in shape data; indeed, size and shape may be differentially impacted by measurement error and different types of landmarks may have relatively larger or smaller errors. Importantly, and consistently with other recent studies using geometric morphometrics on digital images (which, however, were not specific to MRI data), this study showed that inter-operator biases can be a major source of error in the analysis of large samples, as those that are becoming increasingly common in the 'era of big data'. PMID:29787586
Estimation of true incidence of polio: overcoming misclassification errors due to stool culture insensitivity.

PubMed

Srinivas, V; Puliyel, Jacob M

2007-08-01

The diagnosis of polio dependents on culturing the virus in stool samples of children with AFP. Using data obtained under the "Right to Information Act" of instances where only one of the two samples was positive for polio, it was possible to estimate the sensitivity of the system to detect cases of polio. The calculations suggest that there were 1625 (95% CI 1528 to 1725) cases of polio in India in 2006 rather than the 674 reported widely!
Estimating the probability that the sample mean is within a desired fraction of the standard deviation of the true mean.

PubMed

Schillaci, Michael A; Schillaci, Mario E

2009-02-01

The use of small sample sizes in human and primate evolutionary research is commonplace. Estimating how well small samples represent the underlying population, however, is not commonplace. Because the accuracy of determinations of taxonomy, phylogeny, and evolutionary process are dependant upon how well the study sample represents the population of interest, characterizing the uncertainty, or potential error, associated with analyses of small sample sizes is essential. We present a method for estimating the probability that the sample mean is within a desired fraction of the standard deviation of the true mean using small (n<10) or very small (n < or = 5) sample sizes. This method can be used by researchers to determine post hoc the probability that their sample is a meaningful approximation of the population parameter. We tested the method using a large craniometric data set commonly used by researchers in the field. Given our results, we suggest that sample estimates of the population mean can be reasonable and meaningful even when based on small, and perhaps even very small, sample sizes.
PREST-plus identifies pedigree errors and cryptic relatedness in the GAW18 sample using genome-wide SNP data.

PubMed

Sun, Lei; Dimitromanolakis, Apostolos

2014-01-01

Pedigree errors and cryptic relatedness often appear in families or population samples collected for genetic studies. If not identified, these issues can lead to either increased false negatives or false positives in both linkage and association analyses. To identify pedigree errors and cryptic relatedness among individuals from the 20 San Antonio Family Studies (SAFS) families and cryptic relatedness among the 157 putatively unrelated individuals, we apply PREST-plus to the genome-wide single-nucleotide polymorphism (SNP) data and analyze estimated identity-by-descent (IBD) distributions for all pairs of genotyped individuals. Based on the given pedigrees alone, PREST-plus identifies the following putative pairs: 1091 full-sib, 162 half-sib, 360 grandparent-grandchild, 2269 avuncular, 2717 first cousin, 402 half-avuncular, 559 half-first cousin, 2 half-sib+first cousin, 957 parent-offspring and 440,546 unrelated. Using the genotype data, PREST-plus detects 7 mis-specified relative pairs, with their IBD estimates clearly deviating from the null expectations, and it identifies 4 cryptic related pairs involving 7 individuals from 6 families.
Spatial averaging errors in creating hemispherical reflectance (albedo) maps from directional reflectance data

NASA Technical Reports Server (NTRS)

Kimes, D. S.; Kerber, A. G.; Sellers, P. J.

1993-01-01

Spatial averaging errors which may occur when creating hemispherical reflectance maps for different cover types from direct nadir technique to estimate the hemispherical reflectance are assessed by comparing the results with those obtained with a knowledge-based system called VEG (Kimes et al., 1991, 1992). It was found that hemispherical reflectance errors provided by using VEG are much less than those using the direct nadir techniques, depending on conditions. Suggestions are made concerning sampling and averaging strategies for creating hemispherical reflectance maps for photosynthetic, carbon cycle, and climate change studies.
An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

NASA Astrophysics Data System (ADS)

Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

2018-03-01

In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
[The methodology and sample description of the National Survey on Addiction Problems in Hungary 2015 (NSAPH 2015)].

PubMed

Paksi, Borbala; Demetrovics, Zsolt; Magi, Anna; Felvinczi, Katalin

2017-06-01

This paper introduces the methods and methodological findings of the National Survey on Addiction Problems in Hungary (NSAPH 2015). Use patterns of smoking, alcohol use and other psychoactive substances were measured as well as that of certain behavioural addictions (problematic gambling - PGSI, DSM-V, eating disorders - SCOFF, problematic internet use - PIUQ, problematic on-line gaming - POGO, problematic social media use - FAS, exercise addictions - EAI-HU, work addiction - BWAS, compulsive buying - CBS). The paper describes the applied measurement techniques, sample selection, recruitment of respondents and the data collection strategy as well. Methodological results of the survey including reliability and validity of the measures are reported. The NSAPH 2015 research was carried out on a nationally representative sample of the Hungarian adult population aged 16-64 yrs (gross sample 2477, net sample 2274 persons) with the age group of 18-34 being overrepresented. Statistical analysis of the weight-distribution suggests that weighting did not create any artificial distortion in the database leaving the representativeness of the sample unaffected. The size of the weighted sample of the 18-64 years old adult population is 1490 persons. The extent of the theoretical margin of error in the weighted sample is ±2,5%, at a reliability level of 95% which is in line with the original data collection plans. Based on the analysis of reliability and the extent of errors beyond sampling within the context of the database we conclude that inconsistencies create relatively minor distortions in cumulative prevalence rates; consequently the database makes possible the reliable estimation of risk factors related to different substance use behaviours. The reliability indexes of measurements used for prevalence estimates of behavioural addictions proved to be appropriate, though the psychometric features in some cases suggest the presence of redundant items. The comparison of parameters of errors beyond sample selection in the current and previous data collections indicates that trend estimates and their interpretation requires outstanding attention and in some cases even correction procedures might become necessary.
Monte-Carlo-based phase retardation estimator for polarization sensitive optical coherence tomography

NASA Astrophysics Data System (ADS)

Duan, Lian; Makita, Shuichi; Yamanari, Masahiro; Lim, Yiheng; Yasuno, Yoshiaki

2011-08-01

A Monte-Carlo-based phase retardation estimator is developed to correct the systematic error in phase retardation measurement by polarization sensitive optical coherence tomography (PS-OCT). Recent research has revealed that the phase retardation measured by PS-OCT has a distribution that is neither symmetric nor centered at the true value. Hence, a standard mean estimator gives us erroneous estimations of phase retardation, and it degrades the performance of PS-OCT for quantitative assessment. In this paper, the noise property in phase retardation is investigated in detail by Monte-Carlo simulation and experiments. A distribution transform function is designed to eliminate the systematic error by using the result of the Monte-Carlo simulation. This distribution transformation is followed by a mean estimator. This process provides a significantly better estimation of phase retardation than a standard mean estimator. This method is validated both by numerical simulations and experiments. The application of this method to in vitro and in vivo biological samples is also demonstrated.
Self-calibration method without joint iteration for distributed small satellite SAR systems

NASA Astrophysics Data System (ADS)

Xu, Qing; Liao, Guisheng; Liu, Aifei; Zhang, Juan

2013-12-01

The performance of distributed small satellite synthetic aperture radar systems degrades significantly due to the unavoidable array errors, including gain, phase, and position errors, in real operating scenarios. In the conventional method proposed in (IEEE T Aero. Elec. Sys. 42:436-451, 2006), the spectrum components within one Doppler bin are considered as calibration sources. However, it is found in this article that the gain error estimation and the position error estimation in the conventional method can interact with each other. The conventional method may converge to suboptimal solutions in large position errors since it requires the joint iteration between gain-phase error estimation and position error estimation. In addition, it is also found that phase errors can be estimated well regardless of position errors when the zero Doppler bin is chosen. In this article, we propose a method obtained by modifying the conventional one, based on these two observations. In this modified method, gain errors are firstly estimated and compensated, which eliminates the interaction between gain error estimation and position error estimation. Then, by using the zero Doppler bin data, the phase error estimation can be performed well independent of position errors. Finally, position errors are estimated based on the Taylor-series expansion. Meanwhile, the joint iteration between gain-phase error estimation and position error estimation is not required. Therefore, the problem of suboptimal convergence, which occurs in the conventional method, can be avoided with low computational method. The modified method has merits of faster convergence and lower estimation error compared to the conventional one. Theoretical analysis and computer simulation results verified the effectiveness of the modified method.
Impacts of sampling design and estimation methods on nutrient leaching of intensively monitored forest plots in the Netherlands.

PubMed

de Vries, W; Wieggers, H J J; Brus, D J

2010-08-05

Element fluxes through forest ecosystems are generally based on measurements of concentrations in soil solution at regular time intervals at plot locations sampled in a regular grid. Here we present spatially averaged annual element leaching fluxes in three Dutch forest monitoring plots using a new sampling strategy in which both sampling locations and sampling times are selected by probability sampling. Locations were selected by stratified random sampling with compact geographical blocks of equal surface area as strata. In each sampling round, six composite soil solution samples were collected, consisting of five aliquots, one per stratum. The plot-mean concentration was estimated by linear regression, so that the bias due to one or more strata being not represented in the composite samples is eliminated. The sampling times were selected in such a way that the cumulative precipitation surplus of the time interval between two consecutive sampling times was constant, using an estimated precipitation surplus averaged over the past 30 years. The spatially averaged annual leaching flux was estimated by using the modeled daily water flux as an ancillary variable. An important advantage of the new method is that the uncertainty in the estimated annual leaching fluxes due to spatial and temporal variation and resulting sampling errors can be quantified. Results of this new method were compared with the reference approach in which daily leaching fluxes were calculated by multiplying daily interpolated element concentrations with daily water fluxes and then aggregated to a year. Results show that the annual fluxes calculated with the reference method for the period 2003-2005, including all plots, elements and depths, lies only in 53% of the cases within the range of the average +/-2 times the standard error of the new method. Despite the differences in results, both methods indicate comparable N retention and strong Al mobilization in all plots, with Al leaching being nearly equal to the leaching of SO(4) and NO(3) with fluxes expressed in mol(c) ha(-1) yr(-1). This illustrates that Al release, which is the clearest signal of soil acidification, is mainly due to the external input of SO(4) and NO(3).
Survey methods for assessing land cover map accuracy

USGS Publications Warehouse

Nusser, S.M.; Klaas, E.E.

2003-01-01

The increasing availability of digital photographic materials has fueled efforts by agencies and organizations to generate land cover maps for states, regions, and the United States as a whole. Regardless of the information sources and classification methods used, land cover maps are subject to numerous sources of error. In order to understand the quality of the information contained in these maps, it is desirable to generate statistically valid estimates of accuracy rates describing misclassification errors. We explored a full sample survey framework for creating accuracy assessment study designs that balance statistical and operational considerations in relation to study objectives for a regional assessment of GAP land cover maps. We focused not only on appropriate sample designs and estimation approaches, but on aspects of the data collection process, such as gaining cooperation of land owners and using pixel clusters as an observation unit. The approach was tested in a pilot study to assess the accuracy of Iowa GAP land cover maps. A stratified two-stage cluster sampling design addressed sample size requirements for land covers and the need for geographic spread while minimizing operational effort. Recruitment methods used for private land owners yielded high response rates, minimizing a source of nonresponse error. Collecting data for a 9-pixel cluster centered on the sampled pixel was simple to implement, and provided better information on rarer vegetation classes as well as substantial gains in precision relative to observing data at a single-pixel.
The Relation Between Inflation in Type-I and Type-II Error Rate and Population Divergence in Genome-Wide Association Analysis of Multi-Ethnic Populations.

PubMed

Derks, E M; Zwinderman, A H; Gamazon, E R

2017-05-01

Population divergence impacts the degree of population stratification in Genome Wide Association Studies. We aim to: (i) investigate type-I error rate as a function of population divergence (F ST ) in multi-ethnic (admixed) populations; (ii) evaluate the statistical power and effect size estimates; and (iii) investigate the impact of population stratification on the results of gene-based analyses. Quantitative phenotypes were simulated. Type-I error rate was investigated for Single Nucleotide Polymorphisms (SNPs) with varying levels of F ST between the ancestral European and African populations. Type-II error rate was investigated for a SNP characterized by a high value of F ST . In all tests, genomic MDS components were included to correct for population stratification. Type-I and type-II error rate was adequately controlled in a population that included two distinct ethnic populations but not in admixed samples. Statistical power was reduced in the admixed samples. Gene-based tests showed no residual inflation in type-I error rate.
An Investigation of Sample Size Splitting on ATFIND and DIMTEST

ERIC Educational Resources Information Center

Socha, Alan; DeMars, Christine E.

2013-01-01

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Carbon Monitoring System Flux Estimation and Attribution: Impact of ACOS-GOSAT X(CO2) Sampling on the Inference of Terrestrial Biospheric Sources and Sinks

NASA Technical Reports Server (NTRS)

Liu, Junjie; Bowman, Kevin W.; Lee, Memong; Henze, David K.; Bousserez, Nicolas; Brix, Holger; Collatz, G. James; Menemenlis, Dimitris; Ott, Lesley; Pawson, Steven;

2014-01-01

Using an Observing System Simulation Experiment (OSSE), we investigate the impact of JAXA Greenhouse gases Observing SATellite 'IBUKI' (GOSAT) sampling on the estimation of terrestrial biospheric flux with the NASA Carbon Monitoring System Flux (CMS-Flux) estimation and attribution strategy. The simulated observations in the OSSE use the actual column carbon dioxide (X(CO2)) b2.9 retrieval sensitivity and quality control for the year 2010 processed through the Atmospheric CO2 Observations from Space algorithm. CMS-Flux is a variational inversion system that uses the GEOS-Chem forward and adjoint model forced by a suite of observationally constrained fluxes from ocean, land and anthropogenic models. We investigate the impact of GOSAT sampling on flux estimation in two aspects: 1) random error uncertainty reduction and 2) the global and regional bias in posterior flux resulted from the spatiotemporally biased GOSAT sampling. Based on Monte Carlo calculations, we find that global average flux uncertainty reduction ranges from 25% in September to 60% in July. When aggregated to the 11 land regions designated by the phase 3 of the Atmospheric Tracer Transport Model Intercomparison Project, the annual mean uncertainty reduction ranges from 10% over North American boreal to 38% over South American temperate, which is driven by observational coverage and the magnitude of prior flux uncertainty. The uncertainty reduction over the South American tropical region is 30%, even with sparse observation coverage. We show that this reduction results from the large prior flux uncertainty and the impact of non-local observations. Given the assumed prior error statistics, the degree of freedom for signal is approx.1132 for 1-yr of the 74 055 GOSAT X(CO2) observations, which indicates that GOSAT provides approx.1132 independent pieces of information about surface fluxes. We quantify the impact of GOSAT's spatiotemporally sampling on the posterior flux, and find that a 0.7 gigatons of carbon bias in the global annual posterior flux resulted from the seasonally and diurnally biased sampling when using a diagonal prior flux error covariance.

Imperfect pathogen detection from non-invasive skin swabs biases disease inference

USGS Publications Warehouse

DiRenzo, Graziella V.; Grant, Evan H. Campbell; Longo, Ana; Che-Castaldo, Christian; Zamudio, Kelly R.; Lips, Karen

2018-01-01

1. Conservation managers rely on accurate estimates of disease parameters, such as pathogen prevalence and infection intensity, to assess disease status of a host population. However, these disease metrics may be biased if low-level infection intensities are missed by sampling methods or laboratory diagnostic tests. These false negatives underestimate pathogen prevalence and overestimate mean infection intensity of infected individuals. 2. Our objectives were two-fold. First, we quantified false negative error rates of Batrachochytrium dendrobatidis on non-invasive skin swabs collected from an amphibian community in El Copé, Panama. We swabbed amphibians twice in sequence, and we used a recently developed hierarchical Bayesian estimator to assess disease status of the population. Second, we developed a novel hierarchical Bayesian model to simultaneously account for imperfect pathogen detection from field sampling and laboratory diagnostic testing. We evaluated the performance of the model using simulations and varying sampling design to quantify the magnitude of bias in estimates of pathogen prevalence and infection intensity. 3. We show that Bd detection probability from skin swabs was related to host infection intensity, where Bd infections < 10 zoospores have < 95% probability of being detected. If imperfect Bd detection was not considered, then Bd prevalence was underestimated by as much as 16%. In the Bd-amphibian system, this indicates a need to correct for imperfect pathogen detection caused by skin swabs in persisting host communities with low-level infections. More generally, our results have implications for study designs in other disease systems, particularly those with similar objectives, biology, and sampling decisions. 4. Uncertainty in pathogen detection is an inherent property of most sampling protocols and diagnostic tests, where the magnitude of bias depends on the study system, type of infection, and false negative error rates. Given that it may be difficult to know this information in advance, we advocate that the most cautious approach is to assume all errors are possible and to accommodate them by adjusting sampling designs. The modeling framework presented here improves the accuracy in estimating pathogen prevalence and infection intensity.
Relative Performance of Rescaling and Resampling Approaches to Model Chi Square and Parameter Standard Error Estimation in Structural Equation Modeling.

ERIC Educational Resources Information Center

Nevitt, Johnathan; Hancock, Gregory R.

Though common structural equation modeling (SEM) methods are predicated upon the assumption of multivariate normality, applied researchers often find themselves with data clearly violating this assumption and without sufficient sample size to use distribution-free estimation methods. Fortunately, promising alternatives are being integrated into…
The role of global cloud climatologies in validating numerical models

NASA Technical Reports Server (NTRS)

HARSHVARDHAN

1993-01-01

The purpose of this work is to estimate sampling errors of area-time averaged rain rate due to temporal samplings by satellites. In particular, the sampling errors of the proposed low inclination orbit satellite of the Tropical Rainfall Measuring Mission (TRMM) (35 deg inclination and 350 km altitude), one of the sun synchronous polar orbiting satellites of NOAA series (98.89 deg inclination and 833 km altitude), and two simultaneous sun synchronous polar orbiting satellites--assumed to carry a perfect passive microwave sensor for direct rainfall measurements--will be estimated. This estimate is done by performing a study of the satellite orbits and the autocovariance function of the area-averaged rain rate time series. A model based on an exponential fit of the autocovariance function is used for actual calculations. Varying visiting intervals and total coverage of averaging area on each visit by the satellites are taken into account in the model. The data are generated by a General Circulation Model (GCM). The model has a diurnal cycle and parameterized convective processes. A special run of the GCM was made at NASA/GSFC in which the rainfall and precipitable water fields were retained globally for every hour of the run for the whole year.

Toward a Framework for Systematic Error Modeling of NASA Spaceborne Radar with NOAA/NSSL Ground Radar-Based National Mosaic QPE

NASA Technical Reports Server (NTRS)

Kirstettier, Pierre-Emmanual; Honh, Y.; Gourley, J. J.; Chen, S.; Flamig, Z.; Zhang, J.; Howard, K.; Schwaller, M.; Petersen, W.; Amitai, E.

2011-01-01

Characterization of the error associated to satellite rainfall estimates is a necessary component of deterministic and probabilistic frameworks involving space-born passive and active microwave measurement") for applications ranging from water budget studies to forecasting natural hazards related to extreme rainfall events. We focus here on the error structure of NASA's Tropical Rainfall Measurement Mission (TRMM) Precipitation Radar (PR) quantitative precipitation estimation (QPE) at ground. The problem is addressed by comparison of PR QPEs with reference values derived from ground-based measurements using NOAA/NSSL ground radar-based National Mosaic and QPE system (NMQ/Q2). A preliminary investigation of this subject has been carried out at the PR estimation scale (instantaneous and 5 km) using a three-month data sample in the southern part of US. The primary contribution of this study is the presentation of the detailed steps required to derive trustworthy reference rainfall dataset from Q2 at the PR pixel resolution. It relics on a bias correction and a radar quality index, both of which provide a basis to filter out the less trustworthy Q2 values. Several aspects of PR errors arc revealed and quantified including sensitivity to the processing steps with the reference rainfall, comparisons of rainfall detectability and rainfall rate distributions, spatial representativeness of error, and separation of systematic biases and random errors. The methodology and framework developed herein applies more generally to rainfall rate estimates from other sensors onboard low-earth orbiting satellites such as microwave imagers and dual-wavelength radars such as with the Global Precipitation Measurement (GPM) mission.
Model dependence and its effect on ensemble projections in CMIP5

NASA Astrophysics Data System (ADS)

Abramowitz, G.; Bishop, C.

2013-12-01

Conceptually, the notion of model dependence within climate model ensembles is relatively simple - modelling groups share a literature base, parametrisations, data sets and even model code - the potential for dependence in sampling different climate futures is clear. How though can this conceptual problem inform a practical solution that demonstrably improves the ensemble mean and ensemble variance as an estimate of system uncertainty? While some research has already focused on error correlation or error covariance as a candidate to improve ensemble mean estimates, a complete definition of independence must at least implicitly subscribe to an ensemble interpretation paradigm, such as the 'truth-plus-error', 'indistinguishable', or more recently 'replicate Earth' paradigm. Using a definition of model dependence based on error covariance within the replicate Earth paradigm, this presentation will show that accounting for dependence in surface air temperature gives cooler projections in CMIP5 - by as much as 20% globally in some RCPs - although results differ significantly for each RCP, especially regionally. The fact that the change afforded by accounting for dependence across different RCPs is different is not an inconsistent result. Different numbers of submissions to each RCP by different modelling groups mean that differences in projections from different RCPs are not entirely about RCP forcing conditions - they also reflect different sampling strategies.
GNSS Clock Error Impacts on Radio Occultation Retrievals

NASA Astrophysics Data System (ADS)

Weiss, Jan; Sokolovskiy, Sergey; Schreiner, Bill; Yoon, Yoke

2017-04-01

We assess the impacts of GPS and GLONASS clock errors on radio occultation retrieval of bending angle, refractivity, and temperature from low Earth orbit. The major contributing factor is the interpretation of GNSS clock offsets sampled at 30 sec or longer intervals. Using 1 Hz GNSS clock estimates as truth we apply several interpolation and fitting schemes to evaluate how they affect the accuracy of atmospheric retrieval products. The results are organized by GPS and GLONASS space vehicle and the GNSS clock interpolation/fitting scheme. We find that bending angle error is roughly similar for all current GPS transmitters (about 0.7 mcrad) but note some differences related to the type of atomic oscillator onboard the transmitter satellite. GLONASS bending angle errors show more variation over the constellation and are approximately two times larger than GPS. An investigation of the transmitter clock spectra reveals this is due to more power in periods between 2-10 sec. Retrieved refractivity and temperature products show clear differences between GNSS satellite generations, and indicate that GNSS clocks sampled at intervals smaller than 5 sec significantly improve accuracy, particularly for GLONASS. We conclude by summarizing the tested GNSS clock estimation and application strategies in the context of current and future radio occultation missions.
Body composition in Nepalese children using isotope dilution: the production of ethnic-specific calibration equations and an exploration of methodological issues.

PubMed

Devakumar, Delan; Grijalva-Eternod, Carlos S; Roberts, Sebastian; Chaube, Shiva Shankar; Saville, Naomi M; Manandhar, Dharma S; Costello, Anthony; Osrin, David; Wells, Jonathan C K

2015-01-01

Background. Body composition is important as a marker of both current and future health. Bioelectrical impedance (BIA) is a simple and accurate method for estimating body composition, but requires population-specific calibration equations. Objectives. (1) To generate population specific calibration equations to predict lean mass (LM) from BIA in Nepalese children aged 7-9 years. (2) To explore methodological changes that may extend the range and improve accuracy. Methods. BIA measurements were obtained from 102 Nepalese children (52 girls) using the Tanita BC-418. Isotope dilution with deuterium oxide was used to measure total body water and to estimate LM. Prediction equations for estimating LM from BIA data were developed using linear regression, and estimates were compared with those obtained from the Tanita system. We assessed the effects of flexing the arms of children to extend the range of coverage towards lower weights. We also estimated potential error if the number of children included in the study was reduced. Findings. Prediction equations were generated, incorporating height, impedance index, weight and sex as predictors (R (2) 93%). The Tanita system tended to under-estimate LM, with a mean error of 2.2%, but extending up to 25.8%. Flexing the arms to 90° increased the lower weight range, but produced a small error that was not significant when applied to children <16 kg (p 0.42). Reducing the number of children increased the error at the tails of the weight distribution. Conclusions. Population-specific isotope calibration of BIA for Nepalese children has high accuracy. Arm position is important and can be used to extend the range of low weight covered. Smaller samples reduce resource requirements, but leads to large errors at the tails of the weight distribution.
Quantitative endoscopy: initial accuracy measurements.

PubMed

Truitt, T O; Adelman, R A; Kelly, D H; Willging, J P

2000-02-01

The geometric optics of an endoscope can be used to determine the absolute size of an object in an endoscopic field without knowing the actual distance from the object. This study explores the accuracy of a technique that estimates absolute object size from endoscopic images. Quantitative endoscopy involves calibrating a rigid endoscope to produce size estimates from 2 images taken with a known traveled distance between the images. The heights of 12 samples, ranging in size from 0.78 to 11.80 mm, were estimated with this calibrated endoscope. Backup distances of 5 mm and 10 mm were used for comparison. The mean percent error for all estimated measurements when compared with the actual object sizes was 1.12%. The mean errors for 5-mm and 10-mm backup distances were 0.76% and 1.65%, respectively. The mean errors for objects <2 mm and > or =2 mm were 0.94% and 1.18%, respectively. Quantitative endoscopy estimates endoscopic image size to within 5% of the actual object size. This method remains promising for quantitatively evaluating object size from endoscopic images. It does not require knowledge of the absolute distance of the endoscope from the object, rather, only the distance traveled by the endoscope between images.
Measurements of stem diameter: implications for individual- and stand-level errors.

PubMed

Paul, Keryn I; Larmour, John S; Roxburgh, Stephen H; England, Jacqueline R; Davies, Micah J; Luck, Hamish D

2017-08-01

Stem diameter is one of the most common measurements made to assess the growth of woody vegetation, and the commercial and environmental benefits that it provides (e.g. wood or biomass products, carbon sequestration, landscape remediation). Yet inconsistency in its measurement is a continuing source of error in estimates of stand-scale measures such as basal area, biomass, and volume. Here we assessed errors in stem diameter measurement through repeated measurements of individual trees and shrubs of varying size and form (i.e. single- and multi-stemmed) across a range of contrasting stands, from complex mixed-species plantings to commercial single-species plantations. We compared a standard diameter tape with a Stepped Diameter Gauge (SDG) for time efficiency and measurement error. Measurement errors in diameter were slightly (but significantly) influenced by size and form of the tree or shrub, and stem height at which the measurement was made. Compared to standard tape measurement, the mean systematic error with SDG measurement was only -0.17 cm, but varied between -0.10 and -0.52 cm. Similarly, random error was relatively large, with standard deviations (and percentage coefficients of variation) averaging only 0.36 cm (and 3.8%), but varying between 0.14 and 0.61 cm (and 1.9 and 7.1%). However, at the stand scale, sampling errors (i.e. how well individual trees or shrubs selected for measurement of diameter represented the true stand population in terms of the average and distribution of diameter) generally had at least a tenfold greater influence on random errors in basal area estimates than errors in diameter measurements. This supports the use of diameter measurement tools that have high efficiency, such as the SDG. Use of the SDG almost halved the time required for measurements compared to the diameter tape. Based on these findings, recommendations include the following: (i) use of a tape to maximise accuracy when developing allometric models, or when monitoring relatively small changes in permanent sample plots (e.g. National Forest Inventories), noting that care is required in irregular-shaped, large-single-stemmed individuals, and (ii) use of a SDG to maximise efficiency when using inventory methods to assess basal area, and hence biomass or wood volume, at the stand scale (i.e. in studies of impacts of management or site quality) where there are budgetary constraints, noting the importance of sufficient sample sizes to ensure that the population sampled represents the true population.
[Study of spatial stratified sampling strategy of Oncomelania hupensis snail survey based on plant abundance].

PubMed

Xun-Ping, W; An, Z

2017-07-27

Objective To optimize and simplify the survey method of Oncomelania hupensis snails in marshland endemic regions of schistosomiasis, so as to improve the precision, efficiency and economy of the snail survey. Methods A snail sampling strategy (Spatial Sampling Scenario of Oncomelania based on Plant Abundance, SOPA) which took the plant abundance as auxiliary variable was explored and an experimental study in a 50 m×50 m plot in a marshland in the Poyang Lake region was performed. Firstly, the push broom surveyed data was stratified into 5 layers by the plant abundance data; then, the required numbers of optimal sampling points of each layer through Hammond McCullagh equation were calculated; thirdly, every sample point in the line with the Multiple Directional Interpolation (MDI) placement scheme was pinpointed; and finally, the comparison study among the outcomes of the spatial random sampling strategy, the traditional systematic sampling method, the spatial stratified sampling method, Sandwich spatial sampling and inference and SOPA was performed. Results The method (SOPA) proposed in this study had the minimal absolute error of 0.213 8; and the traditional systematic sampling method had the largest estimate, and the absolute error was 0.924 4. Conclusion The snail sampling strategy (SOPA) proposed in this study obtains the higher estimation accuracy than the other four methods.
Estimation of infection prevalence and sensitivity in a stratified two-stage sampling design employing highly specific diagnostic tests when there is no gold standard.

PubMed

Miller, Ezer; Huppert, Amit; Novikov, Ilya; Warburg, Alon; Hailu, Asrat; Abbasi, Ibrahim; Freedman, Laurence S

2015-11-10

In this work, we describe a two-stage sampling design to estimate the infection prevalence in a population. In the first stage, an imperfect diagnostic test was performed on a random sample of the population. In the second stage, a different imperfect test was performed in a stratified random sample of the first sample. To estimate infection prevalence, we assumed conditional independence between the diagnostic tests and develop method of moments estimators based on expectations of the proportions of people with positive and negative results on both tests that are functions of the tests' sensitivity, specificity, and the infection prevalence. A closed-form solution of the estimating equations was obtained assuming a specificity of 100% for both tests. We applied our method to estimate the infection prevalence of visceral leishmaniasis according to two quantitative polymerase chain reaction tests performed on blood samples taken from 4756 patients in northern Ethiopia. The sensitivities of the tests were also estimated, as well as the standard errors of all estimates, using a parametric bootstrap. We also examined the impact of departures from our assumptions of 100% specificity and conditional independence on the estimated prevalence. Copyright © 2015 John Wiley & Sons, Ltd.
Flow tilt angle measurements using lidar anemometry

NASA Astrophysics Data System (ADS)

Dellwik, Ebba; Mann, Jakob

2010-05-01

A new way of estimating near-surface mean flow tilt angles from ground based Doppler lidar measurements is presented. The results are compared with traditional mast based in-situ sonic anemometry. The tilt angle assessed with the lidar is based on 10 or 30 minute mean values of the velocity field from a conically scanning lidar. In this mode of measurement, the lidar beam is rotated in a circle by a prism with a fixed angle to the vertical at varying focus distances. By fitting a trigonometric function to the scans, the mean vertical velocity can be estimated. Lidar measurements from (1) a fetch-limited beech forest site taken at 48-175m above ground level, (2) a reference site in flat agricultural terrain and (3) a second reference site in very complex terrain are presented. The method to derive flow tilt angles and mean vertical velocities from lidar has several advantages compared to sonic anemometry; there is no flow distortion caused by the instrument itself, there are no temperature effects and the instrument misalignment can be corrected for by comparing tilt estimates at various heights. Contrary to mast-based instruments, the lidar measures the wind field with the exact same alignment error at a multitude of heights. Disadvantages with estimating vertical velocities from a lidar compared to mast-based measurements are slightly increased levels of statistical errors due to limited sampling time, because the sampling is disjunct and a requirement for homogeneous flow. The estimated mean vertical velocity is biased if the flow over the scanned circle is not homogeneous. However, the error on the mean vertical velocity due to flow inhomogeneity can be approximated by a function of the angle of the lidar beam to the vertical, the measurement height and the vertical gradient of the mean vertical velocity, whereas the error due to flow inhomogeneity on the horizontal mean wind speed is independent of the lidar beam angle. For the presented measurements over forest, it is evaluated that the systematic error due to the inhomogeneity of the flow is less than 0.2 degrees. Other possibilities for utilizing lidars for flow tilt angle and mean vertical velocities are discussed.
Airborne Lidar-Based Estimates of Tropical Forest Structure in Complex Terrain: Opportunities and Trade-Offs for REDD+

NASA Technical Reports Server (NTRS)

Leitold, Veronika; Keller, Michael; Morton, Douglas C.; Cook, Bruce D.; Shimabukuro, Yosio E.

2015-01-01

Background: Carbon stocks and fluxes in tropical forests remain large sources of uncertainty in the global carbon budget. Airborne lidar remote sensing is a powerful tool for estimating aboveground biomass, provided that lidar measurements penetrate dense forest vegetation to generate accurate estimates of surface topography and canopy heights. Tropical forest areas with complex topography present a challenge for lidar remote sensing. Results: We compared digital terrain models (DTM) derived from airborne lidar data from a mountainous region of the Atlantic Forest in Brazil to 35 ground control points measured with survey grade GNSS receivers. The terrain model generated from full-density (approx. 20 returns/sq m) data was highly accurate (mean signed error of 0.19 +/-0.97 m), while those derived from reduced-density datasets (8/sq m, 4/sq m, 2/sq m and 1/sq m) were increasingly less accurate. Canopy heights calculated from reduced-density lidar data declined as data density decreased due to the inability to accurately model the terrain surface. For lidar return densities below 4/sq m, the bias in height estimates translated into errors of 80-125 Mg/ha in predicted aboveground biomass. Conclusions: Given the growing emphasis on the use of airborne lidar for forest management, carbon monitoring, and conservation efforts, the results of this study highlight the importance of careful survey planning and consistent sampling for accurate quantification of aboveground biomass stocks and dynamics. Approaches that rely primarily on canopy height to estimate aboveground biomass are sensitive to DTM errors from variability in lidar sampling density.
Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data.

PubMed

Lobach, Iryna; Mallick, Bani; Carroll, Raymond J

2011-01-01

Case-control studies are widely used to detect gene-environment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development.
Airborne lidar-based estimates of tropical forest structure in complex terrain: opportunities and trade-offs for REDD+

PubMed

Leitold, Veronika; Keller, Michael; Morton, Douglas C; Cook, Bruce D; Shimabukuro, Yosio E

2015-12-01

Carbon stocks and fluxes in tropical forests remain large sources of uncertainty in the global carbon budget. Airborne lidar remote sensing is a powerful tool for estimating aboveground biomass, provided that lidar measurements penetrate dense forest vegetation to generate accurate estimates of surface topography and canopy heights. Tropical forest areas with complex topography present a challenge for lidar remote sensing. We compared digital terrain models (DTM) derived from airborne lidar data from a mountainous region of the Atlantic Forest in Brazil to 35 ground control points measured with survey grade GNSS receivers. The terrain model generated from full-density (~20 returns m -2 ) data was highly accurate (mean signed error of 0.19 ± 0.97 m), while those derived from reduced-density datasets (8 m -2 , 4 m -2 , 2 m -2 and 1 m -2 ) were increasingly less accurate. Canopy heights calculated from reduced-density lidar data declined as data density decreased due to the inability to accurately model the terrain surface. For lidar return densities below 4 m -2 , the bias in height estimates translated into errors of 80-125 Mg ha -1 in predicted aboveground biomass. Given the growing emphasis on the use of airborne lidar for forest management, carbon monitoring, and conservation efforts, the results of this study highlight the importance of careful survey planning and consistent sampling for accurate quantification of aboveground biomass stocks and dynamics. Approaches that rely primarily on canopy height to estimate aboveground biomass are sensitive to DTM errors from variability in lidar sampling density.
Phase error statistics of a phase-locked loop synchronized direct detection optical PPM communication system

NASA Technical Reports Server (NTRS)

Natarajan, Suresh; Gardner, C. S.

1987-01-01

Receiver timing synchronization of an optical Pulse-Position Modulation (PPM) communication system can be achieved using a phased-locked loop (PLL), provided the photodetector output is suitably processed. The magnitude of the PLL phase error is a good indicator of the timing error at the receiver decoder. The statistics of the phase error are investigated while varying several key system parameters such as PPM order, signal and background strengths, and PPL bandwidth. A practical optical communication system utilizing a laser diode transmitter and an avalanche photodiode in the receiver is described, and the sampled phase error data are presented. A linear regression analysis is applied to the data to obtain estimates of the relational constants involving the phase error variance and incident signal power.
Adaptive Resampling Particle Filters for GPS Carrier-Phase Navigation and Collision Avoidance System

NASA Astrophysics Data System (ADS)

Hwang, Soon Sik

This dissertation addresses three problems: 1) adaptive resampling technique (ART) for Particle Filters, 2) precise relative positioning using Global Positioning System (GPS) Carrier-Phase (CP) measurements applied to nonlinear integer resolution problem for GPS CP navigation using Particle Filters, and 3) collision detection system based on GPS CP broadcasts. First, Monte Carlo filters, called Particle Filters (PF), are widely used where the system is non-linear and non-Gaussian. In real-time applications, their estimation accuracies and efficiencies are significantly affected by the number of particles and the scheduling of relocating weights and samples, the so-called resampling step. In this dissertation, the appropriate number of particles is estimated adaptively such that the error of the sample mean and variance stay in bounds. These bounds are given by the confidence interval of a normal probability distribution for a multi-variate state. Two required number of samples maintaining the mean and variance error within the bounds are derived. The time of resampling is determined when the required sample number for the variance error crosses the required sample number for the mean error. Second, the PF using GPS CP measurements with adaptive resampling is applied to precise relative navigation between two GPS antennas. In order to make use of CP measurements for navigation, the unknown number of cycles between GPS antennas, the so called integer ambiguity, should be resolved. The PF is applied to this integer ambiguity resolution problem where the relative navigation states estimation involves nonlinear observations and nonlinear dynamics equation. Using the PF, the probability density function of the states is estimated by sampling from the position and velocity space and the integer ambiguities are resolved without using the usual hypothesis tests to search for the integer ambiguity. The ART manages the number of position samples and the frequency of the resampling step for real-time kinematics GPS navigation. The experimental results demonstrate the performance of the ART and the insensitivity of the proposed approach to GPS CP cycle-slips. Third, the GPS has great potential for the development of new collision avoidance systems and is being considered for the next generation Traffic alert and Collision Avoidance System (TCAS). The current TCAS equipment, is capable of broadcasting GPS code information to nearby airplanes, and also, the collision avoidance system using the navigation information based on GPS code has been studied by researchers. In this dissertation, the aircraft collision detection system using GPS CP information is addressed. The PF with position samples is employed for the CP based relative position estimation problem and the same algorithm can be used to determine the vehicle attitude if multiple GPS antennas are used. For a reliable and enhanced collision avoidance system, three dimensional trajectories are projected using the estimates of the relative position, velocity, and the attitude. It is shown that the performance of GPS CP based collision detecting algorithm meets the accuracy requirements for a precise approach of flight for auto landing with significantly less unnecessary collision false alarms and no miss alarms.
Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

PubMed Central

Li, Peng; Redden, David T.

2014-01-01

SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738
The Flynn Effect: A Meta-analysis

PubMed Central

Trahan, Lisa; Stuebing, Karla K.; Hiscock, Merril K.; Fletcher, Jack M.

2014-01-01

The “Flynn effect” refers to the observed rise in IQ scores over time, resulting in norms obsolescence. Although the Flynn effect is widely accepted, most approaches to estimating it have relied upon “scorecard” approaches that make estimates of its magnitude and error of measurement controversial and prevent determination of factors that moderate the Flynn effect across different IQ tests. We conducted a meta-analysis to determine the magnitude of the Flynn effect with a higher degree of precision, to determine the error of measurement, and to assess the impact of several moderator variables on the mean effect size. Across 285 studies (N = 14,031) since 1951 with administrations of two intelligence tests with different normative bases, the meta-analytic mean was 2.31, 95% CI [1.99, 2.64], standard score points per decade. The mean effect size for 53 comparisons (N = 3,951) (excluding three atypical studies that inflate the estimates) involving modern (since 1972) Stanford-Binet and Wechsler IQ tests (2.93, 95% CI [2.3, 3.5], IQ points per decade) was comparable to previous estimates of about 3 points per decade, but not consistent with the hypothesis that the Flynn effect is diminishing. For modern tests, study sample (larger increases for validation research samples vs. test standardization samples) and order of administration explained unique variance in the Flynn effect, but age and ability level were not significant moderators. These results supported previous estimates of the Flynn effect and its robustness across different age groups, measures, samples, and levels of performance. PMID:24979188
Estimation of the simple correlation coefficient.

PubMed

Shieh, Gwowen

2010-11-01

This article investigates some unfamiliar properties of the Pearson product-moment correlation coefficient for the estimation of simple correlation coefficient. Although Pearson's r is biased, except for limited situations, and the minimum variance unbiased estimator has been proposed in the literature, researchers routinely employ the sample correlation coefficient in their practical applications, because of its simplicity and popularity. In order to support such practice, this study examines the mean squared errors of r and several prominent formulas. The results reveal specific situations in which the sample correlation coefficient performs better than the unbiased and nearly unbiased estimators, facilitating recommendation of r as an effect size index for the strength of linear association between two variables. In addition, related issues of estimating the squared simple correlation coefficient are also considered.
Bias correction for selecting the minimal-error classifier from many machine learning models.

PubMed

Ding, Ying; Tang, Shaowu; Liao, Serena G; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C

2014-11-15

Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30-60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package 'MLbias' and all source files are publicly available. tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
On Time/Space Aggregation of Fine-Scale Error Estimates (Invited)

NASA Astrophysics Data System (ADS)

Huffman, G. J.

2013-12-01

Estimating errors inherent in fine time/space-scale satellite precipitation data sets is still an on-going problem and a key area of active research. Complicating features of these data sets include the intrinsic intermittency of the precipitation in space and time and the resulting highly skewed distribution of precipitation rates. Additional issues arise from the subsampling errors that satellites introduce, the errors due to retrieval algorithms, and the correlated error that retrieval and merger algorithms sometimes introduce. Several interesting approaches have been developed recently that appear to make progress on these long-standing issues. At the same time, the monthly averages over 2.5°x2.5° grid boxes in the Global Precipitation Climatology Project (GPCP) Satellite-Gauge (SG) precipitation data set follow a very simple sampling-based error model (Huffman 1997) with coefficients that are set using coincident surface and GPCP SG data. This presentation outlines the unsolved problem of how to aggregate the fine-scale errors (discussed above) to an arbitrary time/space averaging volume for practical use in applications, reducing in the limit to simple Gaussian expressions at the monthly 2.5°x2.5° scale. Scatter diagrams with different time/space averaging show that the relationship between the satellite and validation data improves due to the reduction in random error. One of the key, and highly non-linear, issues is that fine-scale estimates tend to have large numbers of cases with points near the axes on the scatter diagram (one of the values is exactly or nearly zero, while the other value is higher). Averaging 'pulls' the points away from the axes and towards the 1:1 line, which usually happens for higher precipitation rates before lower rates. Given this qualitative observation of how aggregation affects error, we observe that existing aggregation rules, such as the Steiner et al. (2003) power law, only depend on the aggregated precipitation rate. Is this sufficient, or is it necessary to aggregate the precipitation error estimates across the time/space data cube used for averaging? At least for small time/space data cubes it would seem that the detailed variables that affect each precipitation error estimate in the aggregation, such as sensor type, land/ocean surface type, convective/stratiform type, and so on, drive variations that must be accounted for explicitly.
A Monte Carlo Study of Levene's Test of Homogeneity of Variance: Empirical Frequencies of Type I Error in Normal Distributions.

ERIC Educational Resources Information Center

Neel, John H.; Stallings, William M.

An influential statistics test recommends a Levene text for homogeneity of variance. A recent note suggests that Levene's test is upwardly biased for small samples. Another report shows inflated Alpha estimates and low power. Neither study utilized more than two sample sizes. This Monte Carlo study involved sampling from a normal population for…

Dealing with AFLP genotyping errors to reveal genetic structure in Plukenetia volubilis (Euphorbiaceae) in the Peruvian Amazon

PubMed Central

Vašek, Jakub; Viehmannová, Iva; Ocelák, Martin; Cachique Huansi, Danter; Vejl, Pavel

2017-01-01

An analysis of the population structure and genetic diversity for any organism often depends on one or more molecular marker techniques. Nonetheless, these techniques are not absolutely reliable because of various sources of errors arising during the genotyping process. Thus, a complex analysis of genotyping error was carried out with the AFLP method in 169 samples of the oil seed plant Plukenetia volubilis L. from small isolated subpopulations in the Peruvian Amazon. Samples were collected in nine localities from the region of San Martin. Analysis was done in eight datasets with a genotyping error from 0 to 5%. Using eleven primer combinations, 102 to 275 markers were obtained according to the dataset. It was found that it is only possible to obtain the most reliable and robust results through a multiple-level filtering process. Genotyping error and software set up influence both the estimation of population structure and genetic diversity, where in our case population number (K) varied between 2–9 depending on the dataset and statistical method used. Surprisingly, discrepancies in K number were caused more by statistical approaches than by genotyping errors themselves. However, for estimation of genetic diversity, the degree of genotyping error was critical because descriptive parameters (He, FST, PLP 5%) varied substantially (by at least 25%). Due to low gene flow, P. volubilis mostly consists of small isolated subpopulations (ΦPT = 0.252–0.323) with some degree of admixture given by socio-economic connectivity among the sites; a direct link between the genetic and geographic distances was not confirmed. The study illustrates the successful application of AFLP to infer genetic structure in non-model plants. PMID:28910307
Methods for estimating streamflow at mountain fronts in southern New Mexico

USGS Publications Warehouse

Waltemeyer, S.D.

1994-01-01

The infiltration of streamflow is potential recharge to alluvial-basin aquifers at or near mountain fronts in southern New Mexico. Data for 13 streamflow-gaging stations were used to determine a relation between mean annual stream- flow and basin and climatic conditions. Regression analysis was used to develop an equation that can be used to estimate mean annual streamflow on the basis of drainage areas and mean annual precipi- tation. The average standard error of estimate for this equation is 46 percent. Regression analysis also was used to develop an equation to estimate mean annual streamflow on the basis of active- channel width. Measurements of the width of active channels were determined for 6 of the 13 gaging stations. The average standard error of estimate for this relation is 29 percent. Stream- flow estimates made using a regression equation based on channel geometry are considered more reliable than estimates made from an equation based on regional relations of basin and climatic conditions. The sample size used to develop these relations was small, however, and the reported standard error of estimate may not represent that of the entire population. Active-channel-width measurements were made at 23 ungaged sites along the Rio Grande upstream from Elephant Butte Reservoir. Data for additional sites would be needed for a more comprehensive assessment of mean annual streamflow in southern New Mexico.
Statistical methodology for estimating the mean difference in a meta-analysis without study-specific variance information.

PubMed

Sangnawakij, Patarawan; Böhning, Dankmar; Adams, Stephen; Stanton, Michael; Holling, Heinz

2017-04-30

Statistical inference for analyzing the results from several independent studies on the same quantity of interest has been investigated frequently in recent decades. Typically, any meta-analytic inference requires that the quantity of interest is available from each study together with an estimate of its variability. The current work is motivated by a meta-analysis on comparing two treatments (thoracoscopic and open) of congenital lung malformations in young children. Quantities of interest include continuous end-points such as length of operation or number of chest tube days. As studies only report mean values (and no standard errors or confidence intervals), the question arises how meta-analytic inference can be developed. We suggest two methods to estimate study-specific variances in such a meta-analysis, where only sample means and sample sizes are available in the treatment arms. A general likelihood ratio test is derived for testing equality of variances in two groups. By means of simulation studies, the bias and estimated standard error of the overall mean difference from both methodologies are evaluated and compared with two existing approaches: complete study analysis only and partial variance information. The performance of the test is evaluated in terms of type I error. Additionally, we illustrate these methods in the meta-analysis on comparing thoracoscopic and open surgery for congenital lung malformations and in a meta-analysis on the change in renal function after kidney donation. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Quantitative, Comparable Coherent Anti-Stokes Raman Scattering (CARS) Spectroscopy: Correcting Errors in Phase Retrieval

PubMed Central

Camp, Charles H.; Lee, Young Jong; Cicerone, Marcus T.

2017-01-01

Coherent anti-Stokes Raman scattering (CARS) microspectroscopy has demonstrated significant potential for biological and materials imaging. To date, however, the primary mechanism of disseminating CARS spectroscopic information is through pseudocolor imagery, which explicitly neglects a vast majority of the hyperspectral data. Furthermore, current paradigms in CARS spectral processing do not lend themselves to quantitative sample-to-sample comparability. The primary limitation stems from the need to accurately measure the so-called nonresonant background (NRB) that is used to extract the chemically-sensitive Raman information from the raw spectra. Measurement of the NRB on a pixel-by-pixel basis is a nontrivial task; thus, reference NRB from glass or water are typically utilized, resulting in error between the actual and estimated amplitude and phase. In this manuscript, we present a new methodology for extracting the Raman spectral features that significantly suppresses these errors through phase detrending and scaling. Classic methods of error-correction, such as baseline detrending, are demonstrated to be inaccurate and to simply mask the underlying errors. The theoretical justification is presented by re-developing the theory of phase retrieval via the Kramers-Kronig relation, and we demonstrate that these results are also applicable to maximum entropy method-based phase retrieval. This new error-correction approach is experimentally applied to glycerol spectra and tissue images, demonstrating marked consistency between spectra obtained using different NRB estimates, and between spectra obtained on different instruments. Additionally, in order to facilitate implementation of these approaches, we have made many of the tools described herein available free for download. PMID:28819335
Minimax Quantum Tomography: Estimators and Relative Entropy Bounds

DOE PAGES

Ferrie, Christopher; Blume-Kohout, Robin

2016-03-04

A minimax estimator has the minimum possible error (“risk”) in the worst case. Here we construct the first minimax estimators for quantum state tomography with relative entropy risk. The minimax risk of nonadaptive tomography scales as O (1/more » $$\\sqrt{N}$$ ) —in contrast to that of classical probability estimation, which is O (1/N) —where N is the number of copies of the quantum state used. We trace this deficiency to sampling mismatch: future observations that determine risk may come from a different sample space than the past data that determine the estimate. Lastly, this makes minimax estimators very biased, and we propose a computationally tractable alternative with similar behavior in the worst case, but superior accuracy on most states.« less
Incorporating GIS and remote sensing for census population disaggregation

NASA Astrophysics Data System (ADS)

Wu, Shuo-Sheng'derek'

Census data are the primary source of demographic data for a variety of researches and applications. For confidentiality issues and administrative purposes, census data are usually released to the public by aggregated areal units. In the United States, the smallest census unit is census blocks. Due to data aggregation, users of census data may have problems in visualizing population distribution within census blocks and estimating population counts for areas not coinciding with census block boundaries. The main purpose of this study is to develop methodology for estimating sub-block areal populations and assessing the estimation errors. The City of Austin, Texas was used as a case study area. Based on tax parcel boundaries and parcel attributes derived from ancillary GIS and remote sensing data, detailed urban land use classes were first classified using a per-field approach. After that, statistical models by land use classes were built to infer population density from other predictor variables, including four census demographic statistics (the Hispanic percentage, the married percentage, the unemployment rate, and per capita income) and three physical variables derived from remote sensing images and building footprints vector data (a landscape heterogeneity statistics, a building pattern statistics, and a building volume statistics). In addition to statistical models, deterministic models were proposed to directly infer populations from building volumes and three housing statistics, including the average space per housing unit, the housing unit occupancy rate, and the average household size. After population models were derived or proposed, how well the models predict populations for another set of sample blocks was assessed. The results show that deterministic models were more accurate than statistical models. Further, by simulating the base unit for modeling from aggregating blocks, I assessed how well the deterministic models estimate sub-unit-level populations. I also assessed the aggregation effects and the resealing effects on sub-unit estimates. Lastly, from another set of mixed-land-use sample blocks, a mixed-land-use model was derived and compared with a residential-land-use model. The results of per-field land use classification are satisfactory with a Kappa accuracy statistics of 0.747. Model Assessments by land use show that population estimates for multi-family land use areas have higher errors than those for single-family land use areas, and population estimates for mixed land use areas have higher errors than those for residential land use areas. The assessments of sub-unit estimates using a simulation approach indicate that smaller areas show higher estimation errors, estimation errors do not relate to the base unit size, and resealing improves all levels of sub-unit estimates.
An evaluation of potential sampling locations in a reservoir with emphasis on conserved spatial correlation structure.

PubMed

Yenilmez, Firdes; Düzgün, Sebnem; Aksoy, Aysegül

2015-01-01

In this study, kernel density estimation (KDE) was coupled with ordinary two-dimensional kriging (OK) to reduce the number of sampling locations in measurement and kriging of dissolved oxygen (DO) concentrations in Porsuk Dam Reservoir (PDR). Conservation of the spatial correlation structure in the DO distribution was a target. KDE was used as a tool to aid in identification of the sampling locations that would be removed from the sampling network in order to decrease the total number of samples. Accordingly, several networks were generated in which sampling locations were reduced from 65 to 10 in increments of 4 or 5 points at a time based on kernel density maps. DO variograms were constructed, and DO values in PDR were kriged. Performance of the networks in DO estimations were evaluated through various error metrics, standard error maps (SEM), and whether the spatial correlation structure was conserved or not. Results indicated that smaller number of sampling points resulted in loss of information in regard to spatial correlation structure in DO. The minimum representative sampling points for PDR was 35. Efficacy of the sampling location selection method was tested against the networks generated by experts. It was shown that the evaluation approach proposed in this study provided a better sampling network design in which the spatial correlation structure of DO was sustained for kriging.
Use of attribute association error probability estimates to evaluate quality of medical record geocodes.

PubMed

Klaus, Christian A; Carrasco, Luis E; Goldberg, Daniel W; Henry, Kevin A; Sherman, Recinda L

2015-09-15

The utility of patient attributes associated with the spatiotemporal analysis of medical records lies not just in their values but also the strength of association between them. Estimating the extent to which a hierarchy of conditional probability exists between patient attribute associations such as patient identifying fields, patient and date of diagnosis, and patient and address at diagnosis is fundamental to estimating the strength of association between patient and geocode, and patient and enumeration area. We propose a hierarchy for the attribute associations within medical records that enable spatiotemporal relationships. We also present a set of metrics that store attribute association error probability (AAEP), to estimate error probability for all attribute associations upon which certainty in a patient geocode depends. A series of experiments were undertaken to understand how error estimation could be operationalized within health data and what levels of AAEP in real data reveal themselves using these methods. Specifically, the goals of this evaluation were to (1) assess if the concept of our error assessment techniques could be implemented by a population-based cancer registry; (2) apply the techniques to real data from a large health data agency and characterize the observed levels of AAEP; and (3) demonstrate how detected AAEP might impact spatiotemporal health research. We present an evaluation of AAEP metrics generated for cancer cases in a North Carolina county. We show examples of how we estimated AAEP for selected attribute associations and circumstances. We demonstrate the distribution of AAEP in our case sample across attribute associations, and demonstrate ways in which disease registry specific operations influence the prevalence of AAEP estimates for specific attribute associations. The effort to detect and store estimates of AAEP is worthwhile because of the increase in confidence fostered by the attribute association level approach to the assessment of uncertainty in patient geocodes, relative to existing geocoding related uncertainty metrics.
A bias correction for covariance estimators to improve inference with generalized estimating equations that use an unstructured correlation matrix.

PubMed

Westgate, Philip M

2013-07-20

Generalized estimating equations (GEEs) are routinely used for the marginal analysis of correlated data. The efficiency of GEE depends on how closely the working covariance structure resembles the true structure, and therefore accurate modeling of the working correlation of the data is important. A popular approach is the use of an unstructured working correlation matrix, as it is not as restrictive as simpler structures such as exchangeable and AR-1 and thus can theoretically improve efficiency. However, because of the potential for having to estimate a large number of correlation parameters, variances of regression parameter estimates can be larger than theoretically expected when utilizing the unstructured working correlation matrix. Therefore, standard error estimates can be negatively biased. To account for this additional finite-sample variability, we derive a bias correction that can be applied to typical estimators of the covariance matrix of parameter estimates. Via simulation and in application to a longitudinal study, we show that our proposed correction improves standard error estimation and statistical inference. Copyright © 2012 John Wiley & Sons, Ltd.
A one-step method for modelling longitudinal data with differential equations.

PubMed

Hu, Yueqin; Treinen, Raymond

2018-04-06

Differential equation models are frequently used to describe non-linear trajectories of longitudinal data. This study proposes a new approach to estimate the parameters in differential equation models. Instead of estimating derivatives from the observed data first and then fitting a differential equation to the derivatives, our new approach directly fits the analytic solution of a differential equation to the observed data, and therefore simplifies the procedure and avoids bias from derivative estimations. A simulation study indicates that the analytic solutions of differential equations (ASDE) approach obtains unbiased estimates of parameters and their standard errors. Compared with other approaches that estimate derivatives first, ASDE has smaller standard error, larger statistical power and accurate Type I error. Although ASDE obtains biased estimation when the system has sudden phase change, the bias is not serious and a solution is also provided to solve the phase problem. The ASDE method is illustrated and applied to a two-week study on consumers' shopping behaviour after a sale promotion, and to a set of public data tracking participants' grammatical facial expression in sign language. R codes for ASDE, recommendations for sample size and starting values are provided. Limitations and several possible expansions of ASDE are also discussed. © 2018 The British Psychological Society.
A minimalist approach to bias estimation for passive sensor measurements with targets of opportunity

NASA Astrophysics Data System (ADS)

Belfadel, Djedjiga; Osborne, Richard W.; Bar-Shalom, Yaakov

2013-09-01

In order to carry out data fusion, registration error correction is crucial in multisensor systems. This requires estimation of the sensor measurement biases. It is important to correct for these bias errors so that the multiple sensor measurements and/or tracks can be referenced as accurately as possible to a common tracking coordinate system. This paper provides a solution for bias estimation for the minimum number of passive sensors (two), when only targets of opportunity are available. The sensor measurements are assumed time-coincident (synchronous) and perfectly associated. Since these sensors provide only line of sight (LOS) measurements, the formation of a single composite Cartesian measurement obtained from fusing the LOS measurements from different sensors is needed to avoid the need for nonlinear filtering. We evaluate the Cramer-Rao Lower Bound (CRLB) on the covariance of the bias estimate, i.e., the quantification of the available information about the biases. Statistical tests on the results of simulations show that this method is statistically efficient, even for small sample sizes (as few as two sensors and six points on the trajectory of a single target of opportunity). We also show that the RMS position error is significantly improved with bias estimation compared with the target position estimation using the original biased measurements.
Developing a generalized allometric equation for aboveground biomass estimation

NASA Astrophysics Data System (ADS)

Xu, Q.; Balamuta, J. J.; Greenberg, J. A.; Li, B.; Man, A.; Xu, Z.

2015-12-01

A key potential uncertainty in estimating carbon stocks across multiple scales stems from the use of empirically calibrated allometric equations, which estimate aboveground biomass (AGB) from plant characteristics such as diameter at breast height (DBH) and/or height (H). The equations themselves contain significant and, at times, poorly characterized errors. Species-specific equations may be missing. Plant responses to their local biophysical environment may lead to spatially varying allometric relationships. The structural predictor may be difficult or impossible to measure accurately, particularly when derived from remote sensing data. All of these issues may lead to significant and spatially varying uncertainties in the estimation of AGB that are unexplored in the literature. We sought to quantify the errors in predicting AGB at the tree and plot level for vegetation plots in California. To accomplish this, we derived a generalized allometric equation (GAE) which we used to model the AGB on a full set of tree information such as DBH, H, taxonomy, and biophysical environment. The GAE was derived using published allometric equations in the GlobAllomeTree database. The equations were sparse in details about the error since authors provide the coefficient of determination (R2) and the sample size. A more realistic simulation of tree AGB should also contain the noise that was not captured by the allometric equation. We derived an empirically corrected variance estimate for the amount of noise to represent the errors in the real biomass. Also, we accounted for the hierarchical relationship between different species by treating each taxonomic level as a covariate nested within a higher taxonomic level (e.g. species < genus). This approach provides estimation under incomplete tree information (e.g. missing species) or blurred information (e.g. conjecture of species), plus the biophysical environment. The GAE allowed us to quantify contribution of each different covariate in estimating the AGB of trees. Lastly, we applied the GAE to an existing vegetation plot database - Forest Inventory and Analysis database - to derive per-tree and per-plot AGB estimations, their errors, and how much the error could be contributed to the original equations, the plant's taxonomy, and their biophysical environment.
A contribution to the calculation of measurement uncertainty and optimization of measuring strategies in coordinate measurement

NASA Astrophysics Data System (ADS)

Waeldele, F.

1983-01-01

The influence of sample shape deviations on the measurement uncertainties and the optimization of computer aided coordinate measurement were investigated for a circle and a cylinder. Using the complete error propagation law in matrix form the parameter uncertainties are calculated, taking the correlation between the measurement points into account. Theoretical investigations show that the measuring points have to be equidistantly distributed and that for a cylindrical body a measuring point distribution along a cross section is better than along a helical line. The theoretically obtained expressions to calculate the uncertainties prove to be a good estimation basis. The simple error theory is not satisfactory for estimation. The complete statistical data analysis theory helps to avoid aggravating measurement errors and to adjust the number of measuring points to the required measuring uncertainty.
Number-counts slope estimation in the presence of Poisson noise

NASA Technical Reports Server (NTRS)

Schmitt, Juergen H. M. M.; Maccacaro, Tommaso

1986-01-01

The slope determination of a power-law number flux relationship in the case of photon-limited sampling. This case is important for high-sensitivity X-ray surveys with imaging telescopes, where the error in an individual source measurement depends on integrated flux and is Poisson, rather than Gaussian, distributed. A bias-free method of slope estimation is developed that takes into account the exact error distribution, the influence of background noise, and the effects of varying limiting sensitivities. It is shown that the resulting bias corrections are quite insensitive to the bias correction procedures applied, as long as only sources with signal-to-noise ratio five or greater are considered. However, if sources with signal-to-noise ratio five or less are included, the derived bias corrections depend sensitively on the shape of the error distribution.
Estimation after classification using lot quality assurance sampling: corrections for curtailed sampling with application to evaluating polio vaccination campaigns.

PubMed

Olives, Casey; Valadez, Joseph J; Pagano, Marcello

2014-03-01

To assess the bias incurred when curtailment of Lot Quality Assurance Sampling (LQAS) is ignored, to present unbiased estimators, to consider the impact of cluster sampling by simulation and to apply our method to published polio immunization data from Nigeria. We present estimators of coverage when using two kinds of curtailed LQAS strategies: semicurtailed and curtailed. We study the proposed estimators with independent and clustered data using three field-tested LQAS designs for assessing polio vaccination coverage, with samples of size 60 and decision rules of 9, 21 and 33, and compare them to biased maximum likelihood estimators. Lastly, we present estimates of polio vaccination coverage from previously published data in 20 local government authorities (LGAs) from five Nigerian states. Simulations illustrate substantial bias if one ignores the curtailed sampling design. Proposed estimators show no bias. Clustering does not affect the bias of these estimators. Across simulations, standard errors show signs of inflation as clustering increases. Neither sampling strategy nor LQAS design influences estimates of polio vaccination coverage in 20 Nigerian LGAs. When coverage is low, semicurtailed LQAS strategies considerably reduces the sample size required to make a decision. Curtailed LQAS designs further reduce the sample size when coverage is high. Results presented dispel the misconception that curtailed LQAS data are unsuitable for estimation. These findings augment the utility of LQAS as a tool for monitoring vaccination efforts by demonstrating that unbiased estimation using curtailed designs is not only possible but these designs also reduce the sample size. © 2014 John Wiley & Sons Ltd.
Drug-drug interaction predictions with PBPK models and optimal multiresponse sampling time designs: application to midazolam and a phase I compound. Part 1: comparison of uniresponse and multiresponse designs using PopDes.

PubMed

Chenel, Marylore; Bouzom, François; Aarons, Leon; Ogungbenro, Kayode

2008-12-01

To determine the optimal sampling time design of a drug-drug interaction (DDI) study for the estimation of apparent clearances (CL/F) of two co-administered drugs (SX, a phase I compound, potentially a CYP3A4 inhibitor, and MDZ, a reference CYP3A4 substrate) without any in vivo data using physiologically based pharmacokinetic (PBPK) predictions, population PK modelling and multiresponse optimal design. PBPK models were developed with AcslXtreme using only in vitro data to simulate PK profiles of both drugs when they were co-administered. Then, using simulated data, population PK models were developed with NONMEM and optimal sampling times were determined by optimizing the determinant of the population Fisher information matrix with PopDes using either two uniresponse designs (UD) or a multiresponse design (MD) with joint sampling times for both drugs. Finally, the D-optimal sampling time designs were evaluated by simulation and re-estimation with NONMEM by computing the relative root mean squared error (RMSE) and empirical relative standard errors (RSE) of CL/F. There were four and five optimal sampling times (=nine different sampling times) in the UDs for SX and MDZ, respectively, whereas there were only five sampling times in the MD. Whatever design and compound, CL/F was well estimated (RSE < 20% for MDZ and <25% for SX) and expected RSEs from PopDes were in the same range as empirical RSEs. Moreover, there was no bias in CL/F estimation. Since MD required only five sampling times compared to the two UDs, D-optimal sampling times of the MD were included into a full empirical design for the proposed clinical trial. A joint paper compares the designs with real data. This global approach including PBPK simulations, population PK modelling and multiresponse optimal design allowed, without any in vivo data, the design of a clinical trial, using sparse sampling, capable of estimating CL/F of the CYP3A4 substrate and potential inhibitor when co-administered together.
On the impact of relatedness on SNP association analysis.

PubMed

Gross, Arnd; Tönjes, Anke; Scholz, Markus

2017-12-06

When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate.
Comparison of Two Methods for Estimating the Sampling-Related Uncertainty of Satellite Rainfall Averages Based on a Large Radar Data Set

NASA Technical Reports Server (NTRS)

Lau, William K. M. (Technical Monitor); Bell, Thomas L.; Steiner, Matthias; Zhang, Yu; Wood, Eric F.

2002-01-01

The uncertainty of rainfall estimated from averages of discrete samples collected by a satellite is assessed using a multi-year radar data set covering a large portion of the United States. The sampling-related uncertainty of rainfall estimates is evaluated for all combinations of 100 km, 200 km, and 500 km space domains, 1 day, 5 day, and 30 day rainfall accumulations, and regular sampling time intervals of 1 h, 3 h, 6 h, 8 h, and 12 h. These extensive analyses are combined to characterize the sampling uncertainty as a function of space and time domain, sampling frequency, and rainfall characteristics by means of a simple scaling law. Moreover, it is shown that both parametric and non-parametric statistical techniques of estimating the sampling uncertainty produce comparable results. Sampling uncertainty estimates, however, do depend on the choice of technique for obtaining them. They can also vary considerably from case to case, reflecting the great variability of natural rainfall, and should therefore be expressed in probabilistic terms. Rainfall calibration errors are shown to affect comparison of results obtained by studies based on data from different climate regions and/or observation platforms.
On the error in crop acreage estimation using satellite (LANDSAT) data

NASA Technical Reports Server (NTRS)

Chhikara, R. (Principal Investigator)

1983-01-01

The problem of crop acreage estimation using satellite data is discussed. Bias and variance of a crop proportion estimate in an area segment obtained from the classification of its multispectral sensor data are derived as functions of the means, variances, and covariance of error rates. The linear discriminant analysis and the class proportion estimation for the two class case are extended to include a third class of measurement units, where these units are mixed on ground. Special attention is given to the investigation of mislabeling in training samples and its effect on crop proportion estimation. It is shown that the bias and variance of the estimate of a specific crop acreage proportion increase as the disparity in mislabeling rates between two classes increases. Some interaction is shown to take place, causing the bias and the variance to decrease at first and then to increase, as the mixed unit class varies in size from 0 to 50 percent of the total area segment.
TECHNICAL ADVANCES: Effects of genotyping protocols on success and errors in identifying individual river otters (Lontra canadensis) from their faeces.

PubMed

Hansen, Heidi; Ben-David, Merav; McDonald, David B

2008-03-01

In noninvasive genetic sampling, when genotyping error rates are high and recapture rates are low, misidentification of individuals can lead to overestimation of population size. Thus, estimating genotyping errors is imperative. Nonetheless, conducting multiple polymerase chain reactions (PCRs) at multiple loci is time-consuming and costly. To address the controversy regarding the minimum number of PCRs required for obtaining a consensus genotype, we compared consumer-style the performance of two genotyping protocols (multiple-tubes and 'comparative method') in respect to genotyping success and error rates. Our results from 48 faecal samples of river otters (Lontra canadensis) collected in Wyoming in 2003, and from blood samples of five captive river otters amplified with four different primers, suggest that use of the comparative genotyping protocol can minimize the number of PCRs per locus. For all but five samples at one locus, the same consensus genotypes were reached with fewer PCRs and with reduced error rates with this protocol compared to the multiple-tubes method. This finding is reassuring because genotyping errors can occur at relatively high rates even in tissues such as blood and hair. In addition, we found that loci that amplify readily and yield consensus genotypes, may still exhibit high error rates (7-32%) and that amplification with different primers resulted in different types and rates of error. Thus, assigning a genotype based on a single PCR for several loci could result in misidentification of individuals. We recommend that programs designed to statistically assign consensus genotypes should be modified to allow the different treatment of heterozygotes and homozygotes intrinsic to the comparative method. © 2007 The Authors.

Assessing the accuracy of body mass estimation equations from pelvic and femoral variables among modern British women of known mass.

PubMed

Young, Mariel; Johannesdottir, Fjola; Poole, Ken; Shaw, Colin; Stock, J T

2018-02-01

Femoral head diameter is commonly used to estimate body mass from the skeleton. The three most frequently employed methods, designed by Ruff, Grine, and McHenry, were developed using different populations to address different research questions. They were not specifically designed for application to female remains, and their accuracy for this purpose has rarely been assessed or compared in living populations. This study analyzes the accuracy of these methods using a sample of modern British women through the use of pelvic CT scans (n = 97) and corresponding information about the individuals' known height and weight. Results showed that all methods provided reasonably accurate body mass estimates (average percent prediction errors under 20%) for the normal weight and overweight subsamples, but were inaccurate for the obese and underweight subsamples (average percent prediction errors over 20%). When women of all body mass categories were combined, the methods provided reasonable estimates (average percent prediction errors between 16 and 18%). The results demonstrate that different methods provide more accurate results within specific body mass index (BMI) ranges. The McHenry Equation provided the most accurate estimation for women of small body size, while the original Ruff Equation is most likely to be accurate if the individual was obese or severely obese. The refined Ruff Equation was the most accurate predictor of body mass on average for the entire sample, indicating that it should be utilized when there is no knowledge of the individual's body size or if the individual is assumed to be of a normal body size. The study also revealed a correlation between pubis length and body mass, and an equation for body mass estimation using pubis length was accurate in a dummy sample, suggesting that pubis length can also be used to acquire reliable body mass estimates. This has implications for how we interpret body mass in fossil hominins and has particular relevance to the interpretation of the long pubic ramus that is characteristic of Neandertals. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cost–Effective Prediction of Gender-Labeling Errors and Estimation of Gender-Labeling Error Rates in Candidate-Gene Association Studies

PubMed Central

Qu, Conghui; Schuetz, Johanna M.; Min, Jeong Eun; Leach, Stephen; Daley, Denise; Spinelli, John J.; Brooks-Wilson, Angela; Graham, Jinko

2011-01-01

We describe a statistical approach to predict gender-labeling errors in candidate-gene association studies, when Y-chromosome markers have not been included in the genotyping set. The approach adds value to methods that consider only the heterozygosity of X-chromosome SNPs, by incorporating available information about the intensity of X-chromosome SNPs in candidate genes relative to autosomal SNPs from the same individual. To our knowledge, no published methods formalize a framework in which heterozygosity and relative intensity are simultaneously taken into account. Our method offers the advantage that, in the genotyping set, no additional space is required beyond that already assigned to X-chromosome SNPs in the candidate genes. We also show how the predictions can be used in a two-phase sampling design to estimate the gender-labeling error rates for an entire study, at a fraction of the cost of a conventional design. PMID:22303327
Accounting for Relatedness in Family Based Genetic Association Studies

PubMed Central

McArdle, P.F.; O’Connell, J.R.; Pollin, T.I.; Baumgarten, M.; Shuldiner, A.R.; Peyser, P.A.; Mitchell, B.D.

2007-01-01

Objective Assess the differences in point estimates, power and type 1 error rates when accounting for and ignoring family structure in genetic tests of association. Methods We compare by simulation the performance of analytic models using variance components to account for family structure and regression models that ignore relatedness for a range of possible family based study designs (i.e., sib pairs vs. large sibships vs. nuclear families vs. extended families). Results Our analyses indicate that effect size estimates and power are not significantly affected by ignoring family structure. Type 1 error rates increase when family structure is ignored, as density of family structures increases, and as trait heritability increases. For discrete traits with moderate levels of heritability and across many common sampling designs, type 1 error rates rise from a nominal 0.05 to 0.11. Conclusion Ignoring family structure may be useful in screening although it comes at a cost of a increased type 1 error rate, the magnitude of which depends on trait heritability and pedigree configuration. PMID:17570925
Bias in the Wagner-Nelson estimate of the fraction of drug absorbed.

PubMed

Wang, Yibin; Nedelman, Jerry

2002-04-01

To examine and quantify bias in the Wagner-Nelson estimate of the fraction of drug absorbed resulting from the estimation error of the elimination rate constant (k), measurement error of the drug concentration, and the truncation error in the area under the curve. Bias in the Wagner-Nelson estimate was derived as a function of post-dosing time (t), k, ratio of absorption rate constant to k (r), and the coefficient of variation for estimates of k (CVk), or CV% for the observed concentration, by assuming a one-compartment model and using an independent estimate of k. The derived functions were used for evaluating the bias with r = 0.5, 3, or 6; k = 0.1 or 0.2; CV, = 0.2 or 0.4; and CV, =0.2 or 0.4; for t = 0 to 30 or 60. Estimation error of k resulted in an upward bias in the Wagner-Nelson estimate that could lead to the estimate of the fraction absorbed being greater than unity. The bias resulting from the estimation error of k inflates the fraction of absorption vs. time profiles mainly in the early post-dosing period. The magnitude of the bias in the Wagner-Nelson estimate resulting from estimation error of k was mainly determined by CV,. The bias in the Wagner-Nelson estimate resulting from to estimation error in k can be dramatically reduced by use of the mean of several independent estimates of k, as in studies for development of an in vivo-in vitro correlation. The truncation error in the area under the curve can introduce a negative bias in the Wagner-Nelson estimate. This can partially offset the bias resulting from estimation error of k in the early post-dosing period. Measurement error of concentration does not introduce bias in the Wagner-Nelson estimate. Estimation error of k results in an upward bias in the Wagner-Nelson estimate, mainly in the early drug absorption phase. The truncation error in AUC can result in a downward bias, which may partially offset the upward bias due to estimation error of k in the early absorption phase. Measurement error of concentration does not introduce bias. The joint effect of estimation error of k and truncation error in AUC can result in a non-monotonic fraction-of-drug-absorbed-vs-time profile. However, only estimation error of k can lead to the Wagner-Nelson estimate of fraction of drug absorbed greater than unity.
A performance model for GPUs with caches

DOE PAGES

Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...

2014-06-24

To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less
Corrected score estimation in the proportional hazards model with misclassified discrete covariates

PubMed Central

Zucker, David M.; Spiegelman, Donna

2013-01-01

SUMMARY We consider Cox proportional hazards regression when the covariate vector includes error-prone discrete covariates along with error-free covariates, which may be discrete or continuous. The misclassification in the discrete error-prone covariates is allowed to be of any specified form. Building on the work of Nakamura and his colleagues, we present a corrected score method for this setting. The method can handle all three major study designs (internal validation design, external validation design, and replicate measures design), both functional and structural error models, and time-dependent covariates satisfying a certain ‘localized error’ condition. We derive the asymptotic properties of the method and indicate how to adjust the covariance matrix of the regression coefficient estimates to account for estimation of the misclassification matrix. We present the results of a finite-sample simulation study under Weibull survival with a single binary covariate having known misclassification rates. The performance of the method described here was similar to that of related methods we have examined in previous works. Specifically, our new estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We also present simulation results for our method for the case where the misclassification probabilities are estimated from an external replicate measures study. Our method generally performed well in these simulations. The new estimator has a broader range of applicability than many other estimators proposed in the literature, including those described in our own earlier work, in that it can handle time-dependent covariates with an arbitrary misclassification structure. We illustrate the method on data from a study of the relationship between dietary calcium intake and distal colon cancer. PMID:18219700
Blinded sample size re-estimation in three-arm trials with 'gold standard' design.

PubMed

Mütze, Tobias; Friede, Tim

2017-10-15

In this article, we study blinded sample size re-estimation in the 'gold standard' design with internal pilot study for normally distributed outcomes. The 'gold standard' design is a three-arm clinical trial design that includes an active and a placebo control in addition to an experimental treatment. We focus on the absolute margin approach to hypothesis testing in three-arm trials at which the non-inferiority of the experimental treatment and the assay sensitivity are assessed by pairwise comparisons. We compare several blinded sample size re-estimation procedures in a simulation study assessing operating characteristics including power and type I error. We find that sample size re-estimation based on the popular one-sample variance estimator results in overpowered trials. Moreover, sample size re-estimation based on unbiased variance estimators such as the Xing-Ganju variance estimator results in underpowered trials, as it is expected because an overestimation of the variance and thus the sample size is in general required for the re-estimation procedure to eventually meet the target power. To overcome this problem, we propose an inflation factor for the sample size re-estimation with the Xing-Ganju variance estimator and show that this approach results in adequately powered trials. Because of favorable features of the Xing-Ganju variance estimator such as unbiasedness and a distribution independent of the group means, the inflation factor does not depend on the nuisance parameter and, therefore, can be calculated prior to a trial. Moreover, we prove that the sample size re-estimation based on the Xing-Ganju variance estimator does not bias the effect estimate. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Syzygies, Pluricanonical Maps, and the Birational Geometry of Varieties of Maximal Albanese Dimension

NASA Astrophysics Data System (ADS)

Tesfagiorgis, Kibrewossen B.

Satellite Precipitation Estimates (SPEs) may be the only available source of information for operational hydrologic and flash flood prediction due to spatial limitations of radar and gauge products in mountainous regions. The present work develops an approach to seamlessly blend satellite, available radar, climatological and gauge precipitation products to fill gaps in ground-based radar precipitation field. To mix different precipitation products, the error of any of the products relative to each other should be removed. For bias correction, the study uses a new ensemble-based method which aims to estimate spatially varying multiplicative biases in SPEs using a radar-gauge precipitation product. Bias factors were calculated for a randomly selected sample of rainy pixels in the study area. Spatial fields of estimated bias were generated taking into account spatial variation and random errors in the sampled values. In addition to biases, sometimes there is also spatial error between the radar and satellite precipitation estimates; one of them has to be geometrically corrected with reference to the other. A set of corresponding raining points between SPE and radar products are selected to apply linear registration using a regularized least square technique to minimize the dislocation error in SPEs with respect to available radar products. A weighted Successive Correction Method (SCM) is used to make the merging between error corrected satellite and radar precipitation estimates. In addition to SCM, we use a combination of SCM and Bayesian spatial method for merging the rain gauges and climatological precipitation sources with radar and SPEs. We demonstrated the method using two satellite-based, CPC Morphing (CMORPH) and Hydro-Estimator (HE), two radar-gauge based, Stage-II and ST-IV, a climatological product PRISM and rain gauge dataset for several rain events from 2006 to 2008 over different geographical locations of the United States. Results show that: (a) the method of ensembles helped reduce biases in SPEs significantly; (b) the SCM method in combination with the Bayesian spatial model produced a precipitation product in good agreement with independent measurements .The study implies that using the available radar pixels surrounding the gap area, rain gauge, PRISM and satellite products, a radar like product is achievable over radar gap areas that benefits the operational meteorology and hydrology community.
Performance of Bootstrapping Approaches To Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling.

ERIC Educational Resources Information Center

Nevitt, Jonathan; Hancock, Gregory R.

2001-01-01

Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…
Improved estimates of ocean heat content from 1960 to 2015

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cheng, Lijing; Trenberth, Kevin E.; Fasullo, John

Earth’s energy imbalance (EEI) drives the ongoing global warming and can best be assessed across the historical record (that is, since 1960) from ocean heat content (OHC) changes. An accurate assessment of OHC is a challenge, mainly because of insufficient and irregular data coverage. We provide here updated OHC estimates with the goal of minimizing associated sampling error. We performed a subsample test, in which subsets of data during the datarich Argo era are colocated with locations of earlier ocean observations, to quantify this error. Our results provide a new OHC estimate with an unbiased mean sampling error and withmore » variability on decadal and multidecadal time scales (signal) that can be reliably distinguished fromsampling error (noise) with signal-to-noise ratios higher than 3. The inferred integrated EEI is greater than that reported in previous assessments and is consistent with a reconstruction of the radiative imbalance at the top of atmosphere starting in 1985. We found that changes in OHC are relatively small before about 1980; since then, OHC has increased fairly steadily and, since 1990, has increasingly involved deeper layers of the ocean. In addition,OHC changes in sixmajor oceans are reliable on decadal timescales. All ocean basins examined have experienced significant warming since 1998, with the greatest warming in the southern oceans, the tropical/subtropical Pacific Ocean, and the tropical/subtropical Atlantic Ocean. This new look at OHC and EEI changes over time provides greater confidence than previously possible, and the data sets produced are a valuable resource for further study.« less
Improved estimates of ocean heat content from 1960 to 2015

DOE PAGES

Cheng, Lijing; Trenberth, Kevin E.; Fasullo, John; ...

2017-03-10

Earth’s energy imbalance (EEI) drives the ongoing global warming and can best be assessed across the historical record (that is, since 1960) from ocean heat content (OHC) changes. An accurate assessment of OHC is a challenge, mainly because of insufficient and irregular data coverage. We provide here updated OHC estimates with the goal of minimizing associated sampling error. We performed a subsample test, in which subsets of data during the datarich Argo era are colocated with locations of earlier ocean observations, to quantify this error. Our results provide a new OHC estimate with an unbiased mean sampling error and withmore » variability on decadal and multidecadal time scales (signal) that can be reliably distinguished fromsampling error (noise) with signal-to-noise ratios higher than 3. The inferred integrated EEI is greater than that reported in previous assessments and is consistent with a reconstruction of the radiative imbalance at the top of atmosphere starting in 1985. We found that changes in OHC are relatively small before about 1980; since then, OHC has increased fairly steadily and, since 1990, has increasingly involved deeper layers of the ocean. In addition,OHC changes in sixmajor oceans are reliable on decadal timescales. All ocean basins examined have experienced significant warming since 1998, with the greatest warming in the southern oceans, the tropical/subtropical Pacific Ocean, and the tropical/subtropical Atlantic Ocean. This new look at OHC and EEI changes over time provides greater confidence than previously possible, and the data sets produced are a valuable resource for further study.« less
Model-based inference for small area estimation with sampling weights

PubMed Central

Vandendijck, Y.; Faes, C.; Kirby, R.S.; Lawson, A.; Hens, N.

2017-01-01

Obtaining reliable estimates about health outcomes for areas or domains where only few to no samples are available is the goal of small area estimation (SAE). Often, we rely on health surveys to obtain information about health outcomes. Such surveys are often characterised by a complex design, stratification, and unequal sampling weights as common features. Hierarchical Bayesian models are well recognised in SAE as a spatial smoothing method, but often ignore the sampling weights that reflect the complex sampling design. In this paper, we focus on data obtained from a health survey where the sampling weights of the sampled individuals are the only information available about the design. We develop a predictive model-based approach to estimate the prevalence of a binary outcome for both the sampled and non-sampled individuals, using hierarchical Bayesian models that take into account the sampling weights. A simulation study is carried out to compare the performance of our proposed method with other established methods. The results indicate that our proposed method achieves great reductions in mean squared error when compared with standard approaches. It performs equally well or better when compared with more elaborate methods when there is a relationship between the responses and the sampling weights. The proposed method is applied to estimate asthma prevalence across districts. PMID:28989860
Estimating variation in a landscape simulation of forest structure.

Treesearch

S. Hummel; P. Cunningham

2006-01-01

Modern technology makes it easy to show how forested landscapes might change with time but it remains difficult to estimate how sampling error affects landscape simulation results. To address this problem we used two methods to project the area in late-sera1 forest (LSF) structure for the same 6070 hectare (ha) study site over 30 years. The site was stratified into...
The Performance of ML, GLS, and WLS Estimation in Structural Equation Modeling under Conditions of Misspecification and Nonnormality.

ERIC Educational Resources Information Center

Olsson, Ulf Henning; Foss, Tron; Troye, Sigurd V.; Howell, Roy D.

2000-01-01

Used simulation to demonstrate how the choice of estimation method affects indexes of fit and parameter bias for different sample sizes when nested models vary in terms of specification error and the data demonstrate different levels of kurtosis. Discusses results for maximum likelihood (ML), generalized least squares (GLS), and weighted least…
Bayesian Methods for the Physical Sciences. Learning from Examples in Astronomy and Physics.

NASA Astrophysics Data System (ADS)

Andreon, Stefano; Weaver, Brian

2015-05-01

Chapter 1: This chapter presents some basic steps for performing a good statistical analysis, all summarized in about one page. Chapter 2: This short chapter introduces the basics of probability theory inan intuitive fashion using simple examples. It also illustrates, again with examples, how to propagate errors and the difference between marginal and profile likelihoods. Chapter 3: This chapter introduces the computational tools and methods that we use for sampling from the posterior distribution. Since all numerical computations, and Bayesian ones are no exception, may end in errors, we also provide a few tips to check that the numerical computation is sampling from the posterior distribution. Chapter 4: Many of the concepts of building, running, and summarizing the resultsof a Bayesian analysis are described with this step-by-step guide using a basic (Gaussian) model. The chapter also introduces examples using Poisson and Binomial likelihoods, and how to combine repeated independent measurements. Chapter 5: All statistical analyses make assumptions, and Bayesian analyses are no exception. This chapter emphasizes that results depend on data and priors (assumptions). We illustrate this concept with examples where the prior plays greatly different roles, from major to negligible. We also provide some advice on how to look for information useful for sculpting the prior. Chapter 6: In this chapter we consider examples for which we want to estimate more than a single parameter. These common problems include estimating location and spread. We also consider examples that require the modeling of two populations (one we are interested in and a nuisance population) or averaging incompatible measurements. We also introduce quite complex examples dealing with upper limits and with a larger-than-expected scatter. Chapter 7: Rarely is a sample randomly selected from the population we wish to study. Often, samples are affected by selection effects, e.g., easier-to-collect events or objects are over-represented in samples and difficult-to-collect are under-represented if not missing altogether. In this chapter we show how to account for non-random data collection to infer the properties of the population from the studied sample. Chapter 8: In this chapter we introduce regression models, i.e., how to fit (regress) one, or more quantities, against each other through a functional relationship and estimate any unknown parameters that dictate this relationship. Questions of interest include: how to deal with samples affected by selection effects? How does a rich data structure influence the fitted parameters? And what about non-linear multiple-predictor fits, upper/lower limits, measurements errors of different amplitudes and an intrinsic variety in the studied populations or an extra source of variability? A number of examples illustrate how to answer these questions and how to predict the value of an unavailable quantity by exploiting the existence of a trend with another, available, quantity. Chapter 9: This chapter provides some advice on how the careful scientist should perform model checking and sensitivity analysis, i.e., how to answer the following questions: is the considered model at odds with the current available data (the fitted data), for example because it is over-simplified compared to some specific complexity pointed out by the data? Furthermore, are the data informative about the quantity being measured or are results sensibly dependent on details of the fitted model? And, finally, what about if assumptions are uncertain? A number of examples illustrate how to answer these questions. Chapter 10: This chapter compares the performance of Bayesian methods against simple, non-Bayesian alternatives, such as maximum likelihood, minimal chi square, ordinary and weighted least square, bivariate correlated errors and intrinsic scatter, and robust estimates of location and scale. Performances are evaluated in terms of quality of the prediction, accuracy of the estimates, and fairness and noisiness of the quoted errors. We also focus on three failures of maximum likelihood methods occurring with small samples, with mixtures, and with regressions with errors in the predictor quantity.
Quantifying error of lidar and sodar Doppler beam swinging measurements of wind turbine wakes using computational fluid dynamics

DOE PAGES

Lundquist, J. K.; Churchfield, M. J.; Lee, S.; ...

2015-02-23

Wind-profiling lidars are now regularly used in boundary-layer meteorology and in applications such as wind energy and air quality. Lidar wind profilers exploit the Doppler shift of laser light backscattered from particulates carried by the wind to measure a line-of-sight (LOS) velocity. The Doppler beam swinging (DBS) technique, used by many commercial systems, considers measurements of this LOS velocity in multiple radial directions in order to estimate horizontal and vertical winds. The method relies on the assumption of homogeneous flow across the region sampled by the beams. Using such a system in inhomogeneous flow, such as wind turbine wakes ormore » complex terrain, will result in errors. To quantify the errors expected from such violation of the assumption of horizontal homogeneity, we simulate inhomogeneous flow in the atmospheric boundary layer, notably stably stratified flow past a wind turbine, with a mean wind speed of 6.5 m s -1 at the turbine hub-height of 80 m. This slightly stable case results in 15° of wind direction change across the turbine rotor disk. The resulting flow field is sampled in the same fashion that a lidar samples the atmosphere with the DBS approach, including the lidar range weighting function, enabling quantification of the error in the DBS observations. The observations from the instruments located upwind have small errors, which are ameliorated with time averaging. However, the downwind observations, particularly within the first two rotor diameters downwind from the wind turbine, suffer from errors due to the heterogeneity of the wind turbine wake. Errors in the stream-wise component of the flow approach 30% of the hub-height inflow wind speed close to the rotor disk. Errors in the cross-stream and vertical velocity components are also significant: cross-stream component errors are on the order of 15% of the hub-height inflow wind speed (1.0 m s −1) and errors in the vertical velocity measurement exceed the actual vertical velocity. By three rotor diameters downwind, DBS-based assessments of wake wind speed deficits based on the stream-wise velocity can be relied on even within the near wake within 1.0 s -1 (or 15% of the hub-height inflow wind speed), and the cross-stream velocity error is reduced to 8% while vertical velocity estimates are compromised. Furthermore, measurements of inhomogeneous flow such as wind turbine wakes are susceptible to these errors, and interpretations of field observations should account for this uncertainty.« less
Quantifying error of lidar and sodar Doppler beam swinging measurements of wind turbine wakes using computational fluid dynamics

NASA Astrophysics Data System (ADS)

Lundquist, J. K.; Churchfield, M. J.; Lee, S.; Clifton, A.

2015-02-01

Wind-profiling lidars are now regularly used in boundary-layer meteorology and in applications such as wind energy and air quality. Lidar wind profilers exploit the Doppler shift of laser light backscattered from particulates carried by the wind to measure a line-of-sight (LOS) velocity. The Doppler beam swinging (DBS) technique, used by many commercial systems, considers measurements of this LOS velocity in multiple radial directions in order to estimate horizontal and vertical winds. The method relies on the assumption of homogeneous flow across the region sampled by the beams. Using such a system in inhomogeneous flow, such as wind turbine wakes or complex terrain, will result in errors. To quantify the errors expected from such violation of the assumption of horizontal homogeneity, we simulate inhomogeneous flow in the atmospheric boundary layer, notably stably stratified flow past a wind turbine, with a mean wind speed of 6.5 m s-1 at the turbine hub-height of 80 m. This slightly stable case results in 15° of wind direction change across the turbine rotor disk. The resulting flow field is sampled in the same fashion that a lidar samples the atmosphere with the DBS approach, including the lidar range weighting function, enabling quantification of the error in the DBS observations. The observations from the instruments located upwind have small errors, which are ameliorated with time averaging. However, the downwind observations, particularly within the first two rotor diameters downwind from the wind turbine, suffer from errors due to the heterogeneity of the wind turbine wake. Errors in the stream-wise component of the flow approach 30% of the hub-height inflow wind speed close to the rotor disk. Errors in the cross-stream and vertical velocity components are also significant: cross-stream component errors are on the order of 15% of the hub-height inflow wind speed (1.0 m s-1) and errors in the vertical velocity measurement exceed the actual vertical velocity. By three rotor diameters downwind, DBS-based assessments of wake wind speed deficits based on the stream-wise velocity can be relied on even within the near wake within 1.0 m s-1 (or 15% of the hub-height inflow wind speed), and the cross-stream velocity error is reduced to 8% while vertical velocity estimates are compromised. Measurements of inhomogeneous flow such as wind turbine wakes are susceptible to these errors, and interpretations of field observations should account for this uncertainty.
An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy

PubMed Central

Harris, Alexandre M.; DeGiorgio, Michael

2016-01-01

Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781
Modification of the Sandwich Estimator in Generalized Estimating Equations with Correlated Binary Outcomes in Rare Event and Small Sample Settings

PubMed Central

Rogers, Paul; Stoner, Julie

2016-01-01

Regression models for correlated binary outcomes are commonly fit using a Generalized Estimating Equations (GEE) methodology. GEE uses the Liang and Zeger sandwich estimator to produce unbiased standard error estimators for regression coefficients in large sample settings even when the covariance structure is misspecified. The sandwich estimator performs optimally in balanced designs when the number of participants is large, and there are few repeated measurements. The sandwich estimator is not without drawbacks; its asymptotic properties do not hold in small sample settings. In these situations, the sandwich estimator is biased downwards, underestimating the variances. In this project, a modified form for the sandwich estimator is proposed to correct this deficiency. The performance of this new sandwich estimator is compared to the traditional Liang and Zeger estimator as well as alternative forms proposed by Morel, Pan and Mancl and DeRouen. The performance of each estimator was assessed with 95% coverage probabilities for the regression coefficient estimators using simulated data under various combinations of sample sizes and outcome prevalence values with an Independence (IND), Autoregressive (AR) and Compound Symmetry (CS) correlation structure. This research is motivated by investigations involving rare-event outcomes in aviation data. PMID:26998504
Technical note: estimating sex using cervical canine odontometrics: a test using a known sex sample.

PubMed

Hassett, Brenna

2011-11-01

The size of the permanent human canine tooth is one of the few sexually dimorphic features to be present in childhood and as such offers the opportunity to assist in the identification of sex in remains where no other appropriate criteria exist, such as in subadults. However, canine odontometrics are often associated with high levels of interobserver error and can be difficult to access if dentition is in situ. Additionally, appropriate points of measurement can be difficult to identify if the tooth is worn. Alternate measurements of the cervical canine diameters have been proposed as solutions to these issues, but the utility of these measurements in estimating sex has not been conclusively demonstrated. This study uses the buccolingual and mesiodistal cervical diameter of the canines from a known-sex sample from St. Bride's Church, London and a partially known-sex sample from the Old Church, Chelsea, London to classify individuals as male or female. A discriminant function classification using these diameters successfully identifies sex in 93.8% of the known-sex assemblage and 95% of the partially osteologically estimated sex assemblage. It is suggested that cervical canine diameters are highly repeatable measurements with low interobserver error, can be obtained on worn and in situ teeth, and provide as good or better guidance on estimating sex in human remains as standard maximal diameters. Copyright © 2011 Wiley-Liss, Inc.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.