Sample records for estimation sampling modeling

  1. Validation of abundance estimates from mark–recapture and removal techniques for rainbow trout captured by electrofishing in small streams

    USGS Publications Warehouse

    Rosenberger, Amanda E.; Dunham, Jason B.

    2005-01-01

    Estimation of fish abundance in streams using the removal model or the Lincoln - Peterson mark - recapture model is a common practice in fisheries. These models produce misleading results if their assumptions are violated. We evaluated the assumptions of these two models via electrofishing of rainbow trout Oncorhynchus mykiss in central Idaho streams. For one-, two-, three-, and four-pass sampling effort in closed sites, we evaluated the influences of fish size and habitat characteristics on sampling efficiency and the accuracy of removal abundance estimates. We also examined the use of models to generate unbiased estimates of fish abundance through adjustment of total catch or biased removal estimates. Our results suggested that the assumptions of the mark - recapture model were satisfied and that abundance estimates based on this approach were unbiased. In contrast, the removal model assumptions were not met. Decreasing sampling efficiencies over removal passes resulted in underestimated population sizes and overestimates of sampling efficiency. This bias decreased, but was not eliminated, with increased sampling effort. Biased removal estimates based on different levels of effort were highly correlated with each other but were less correlated with unbiased mark - recapture estimates. Stream size decreased sampling efficiency, and stream size and instream wood increased the negative bias of removal estimates. We found that reliable estimates of population abundance could be obtained from models of sampling efficiency for different levels of effort. Validation of abundance estimates requires extra attention to routine sampling considerations but can help fisheries biologists avoid pitfalls associated with biased data and facilitate standardized comparisons among studies that employ different sampling methods.

  2. Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution.

    PubMed

    Baele, Guy; Lemey, Philippe; Vansteelandt, Stijn

    2013-03-06

    Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.

  3. Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution

    PubMed Central

    2013-01-01

    Background Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results We here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation. PMID:23497171

  4. Utilizing Adjoint-Based Error Estimates for Surrogate Models to Accurately Predict Probabilities of Events

    DOE PAGES

    Butler, Troy; Wildey, Timothy

    2018-01-01

    In thist study, we develop a procedure to utilize error estimates for samples of a surrogate model to compute robust upper and lower bounds on estimates of probabilities of events. We show that these error estimates can also be used in an adaptive algorithm to simultaneously reduce the computational cost and increase the accuracy in estimating probabilities of events using computationally expensive high-fidelity models. Specifically, we introduce the notion of reliability of a sample of a surrogate model, and we prove that utilizing the surrogate model for the reliable samples and the high-fidelity model for the unreliable samples gives preciselymore » the same estimate of the probability of the output event as would be obtained by evaluation of the original model for each sample. The adaptive algorithm uses the additional evaluations of the high-fidelity model for the unreliable samples to locally improve the surrogate model near the limit state, which significantly reduces the number of high-fidelity model evaluations as the limit state is resolved. Numerical results based on a recently developed adjoint-based approach for estimating the error in samples of a surrogate are provided to demonstrate (1) the robustness of the bounds on the probability of an event, and (2) that the adaptive enhancement algorithm provides a more accurate estimate of the probability of the QoI event than standard response surface approximation methods at a lower computational cost.« less

  5. Utilizing Adjoint-Based Error Estimates for Surrogate Models to Accurately Predict Probabilities of Events

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Butler, Troy; Wildey, Timothy

    In thist study, we develop a procedure to utilize error estimates for samples of a surrogate model to compute robust upper and lower bounds on estimates of probabilities of events. We show that these error estimates can also be used in an adaptive algorithm to simultaneously reduce the computational cost and increase the accuracy in estimating probabilities of events using computationally expensive high-fidelity models. Specifically, we introduce the notion of reliability of a sample of a surrogate model, and we prove that utilizing the surrogate model for the reliable samples and the high-fidelity model for the unreliable samples gives preciselymore » the same estimate of the probability of the output event as would be obtained by evaluation of the original model for each sample. The adaptive algorithm uses the additional evaluations of the high-fidelity model for the unreliable samples to locally improve the surrogate model near the limit state, which significantly reduces the number of high-fidelity model evaluations as the limit state is resolved. Numerical results based on a recently developed adjoint-based approach for estimating the error in samples of a surrogate are provided to demonstrate (1) the robustness of the bounds on the probability of an event, and (2) that the adaptive enhancement algorithm provides a more accurate estimate of the probability of the QoI event than standard response surface approximation methods at a lower computational cost.« less

  6. Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter.

    PubMed

    Lord, Dominique

    2006-07-01

    There has been considerable research conducted on the development of statistical models for predicting crashes on highway facilities. Despite numerous advancements made for improving the estimation tools of statistical models, the most common probabilistic structure used for modeling motor vehicle crashes remains the traditional Poisson and Poisson-gamma (or Negative Binomial) distribution; when crash data exhibit over-dispersion, the Poisson-gamma model is usually the model of choice most favored by transportation safety modelers. Crash data collected for safety studies often have the unusual attributes of being characterized by low sample mean values. Studies have shown that the goodness-of-fit of statistical models produced from such datasets can be significantly affected. This issue has been defined as the "low mean problem" (LMP). Despite recent developments on methods to circumvent the LMP and test the goodness-of-fit of models developed using such datasets, no work has so far examined how the LMP affects the fixed dispersion parameter of Poisson-gamma models used for modeling motor vehicle crashes. The dispersion parameter plays an important role in many types of safety studies and should, therefore, be reliably estimated. The primary objective of this research project was to verify whether the LMP affects the estimation of the dispersion parameter and, if it is, to determine the magnitude of the problem. The secondary objective consisted of determining the effects of an unreliably estimated dispersion parameter on common analyses performed in highway safety studies. To accomplish the objectives of the study, a series of Poisson-gamma distributions were simulated using different values describing the mean, the dispersion parameter, and the sample size. Three estimators commonly used by transportation safety modelers for estimating the dispersion parameter of Poisson-gamma models were evaluated: the method of moments, the weighted regression, and the maximum likelihood method. In an attempt to complement the outcome of the simulation study, Poisson-gamma models were fitted to crash data collected in Toronto, Ont. characterized by a low sample mean and small sample size. The study shows that a low sample mean combined with a small sample size can seriously affect the estimation of the dispersion parameter, no matter which estimator is used within the estimation process. The probability the dispersion parameter becomes unreliably estimated increases significantly as the sample mean and sample size decrease. Consequently, the results show that an unreliably estimated dispersion parameter can significantly undermine empirical Bayes (EB) estimates as well as the estimation of confidence intervals for the gamma mean and predicted response. The paper ends with recommendations about minimizing the likelihood of producing Poisson-gamma models with an unreliable dispersion parameter for modeling motor vehicle crashes.

  7. Effects of sample size on estimates of population growth rates calculated with matrix models.

    PubMed

    Fiske, Ian J; Bruna, Emilio M; Bolker, Benjamin M

    2008-08-28

    Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.

  8. Network Model-Assisted Inference from Respondent-Driven Sampling Data

    PubMed Central

    Gile, Krista J.; Handcock, Mark S.

    2015-01-01

    Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328

  9. Network Model-Assisted Inference from Respondent-Driven Sampling Data.

    PubMed

    Gile, Krista J; Handcock, Mark S

    2015-06-01

    Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.

  10. Change-in-ratio estimators for populations with more than two subclasses

    USGS Publications Warehouse

    Udevitz, Mark S.; Pollock, Kenneth H.

    1991-01-01

    Change-in-ratio methods have been developed to estimate the size of populations with two or three population subclasses. Most of these methods require the often unreasonable assumption of equal sampling probabilities for individuals in all subclasses. This paper presents new models based on the weaker assumption that ratios of sampling probabilities are constant over time for populations with three or more subclasses. Estimation under these models requires that a value be assumed for one of these ratios when there are two samples. Explicit expressions are given for the maximum likelihood estimators under models for two samples with three or more subclasses and for three samples with two subclasses. A numerical method using readily available statistical software is described for obtaining the estimators and their standard errors under all of the models. Likelihood ratio tests that can be used in model selection are discussed. Emphasis is on the two-sample, three-subclass models for which Monte-Carlo simulation results and an illustrative example are presented.

  11. Performance of Random Effects Model Estimators under Complex Sampling Designs

    ERIC Educational Resources Information Center

    Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan

    2011-01-01

    In this article, we consider estimation of parameters of random effects models from samples collected via complex multistage designs. Incorporation of sampling weights is one way to reduce estimation bias due to unequal probabilities of selection. Several weighting methods have been proposed in the literature for estimating the parameters of…

  12. Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas

    USGS Publications Warehouse

    Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.

    2003-01-01

    The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.

  13. A parametric generalization of the Hayne estimator for line transect sampling

    USGS Publications Warehouse

    Burnham, Kenneth P.

    1979-01-01

    The Hayne model for line transect sampling is generalized by using an elliptical (rather than circular) flushing model for animal detection. By assuming the ration of major and minor axes lengths is constant for all animals, a model results which allows estimation of population density based directly upon sighting distances and sighting angles. The derived estimator of animal density is a generalization of the Hayne estimator for line transect sampling.

  14. Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST)

    PubMed Central

    Xu, Chonggang; Gertner, George

    2013-01-01

    Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. PMID:24143037

  15. Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST).

    PubMed

    Xu, Chonggang; Gertner, George

    2011-01-01

    Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements.

  16. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    USGS Publications Warehouse

    Aulenbach, Brent T.

    2013-01-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  17. Profile local linear estimation of generalized semiparametric regression model for longitudinal data.

    PubMed

    Sun, Yanqing; Sun, Liuquan; Zhou, Jie

    2013-07-01

    This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.

  18. Estimating parasitic sea lamprey abundance in Lake Huron from heterogenous data sources

    USGS Publications Warehouse

    Young, Robert J.; Jones, Michael L.; Bence, James R.; McDonald, Rodney B.; Mullett, Katherine M.; Bergstedt, Roger A.

    2003-01-01

    The Great Lakes Fishery Commission uses time series of transformer, parasitic, and spawning population estimates to evaluate the effectiveness of its sea lamprey (Petromyzon marinus) control program. This study used an inverse variance weighting method to integrate Lake Huron sea lamprey population estimates derived from two estimation procedures: 1) prediction of the lake-wide spawning population from a regression model based on stream size and, 2) whole-lake mark and recapture estimates. In addition, we used a re-sampling procedure to evaluate the effect of trading off sampling effort between the regression and mark-recapture models. Population estimates derived from the regression model ranged from 132,000 to 377,000 while mark-recapture estimates of marked recently metamorphosed juveniles and parasitic sea lampreys ranged from 536,000 to 634,000 and 484,000 to 1,608,000, respectively. The precision of the estimates varied greatly among estimation procedures and years. The integrated estimate of the mark-recapture and spawner regression procedures ranged from 252,000 to 702,000 transformers. The re-sampling procedure indicated that the regression model is more sensitive to reduction in sampling effort than the mark-recapture model. Reliance on either the regression or mark-recapture model alone could produce misleading estimates of abundance of sea lampreys and the effect of the control program on sea lamprey abundance. These analyses indicate that the precision of the lakewide population estimate can be maximized by re-allocating sampling effort from marking sea lampreys to trapping additional streams.

  19. Estimating the effectiveness of further sampling in species inventories

    USGS Publications Warehouse

    Keating, K.A.; Quinn, J.F.; Ivie, M.A.; Ivie, L.L.

    1998-01-01

    Estimators of the number of additional species expected in the next ??n samples offer a potentially important tool for improving cost-effectiveness of species inventories but are largely untested. We used Monte Carlo methods to compare 11 such estimators, across a range of community structures and sampling regimes, and validated our results, where possible, using empirical data from vascular plant and beetle inventories from Glacier National Park, Montana, USA. We found that B. Efron and R. Thisted's 1976 negative binomial estimator was most robust to differences in community structure and that it was among the most accurate estimators when sampling was from model communities with structures resembling the large, heterogeneous communities that are the likely targets of major inventory efforts. Other estimators may be preferred under specific conditions, however. For example, when sampling was from model communities with highly even species-abundance distributions, estimates based on the Michaelis-Menten model were most accurate; when sampling was from moderately even model communities with S=10 species or communities with highly uneven species-abundance distributions, estimates based on Gleason's (1922) species-area model were most accurate. We suggest that use of such methods in species inventories can help improve cost-effectiveness by providing an objective basis for redirecting sampling to more-productive sites, methods, or time periods as the expectation of detecting additional species becomes unacceptably low.

  20. Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

    PubMed Central

    Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal

    2009-01-01

    The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455

  1. Uncertainty of streamwater solute fluxes in five contrasting headwater catchments including model uncertainty and natural variability (Invited)

    NASA Astrophysics Data System (ADS)

    Aulenbach, B. T.; Burns, D. A.; Shanley, J. B.; Yanai, R. D.; Bae, K.; Wild, A.; Yang, Y.; Dong, Y.

    2013-12-01

    There are many sources of uncertainty in estimates of streamwater solute flux. Flux is the product of discharge and concentration (summed over time), each of which has measurement uncertainty of its own. Discharge can be measured almost continuously, but concentrations are usually determined from discrete samples, which increases uncertainty dependent on sampling frequency and how concentrations are assigned for the periods between samples. Gaps between samples can be estimated by linear interpolation or by models that that use the relations between concentration and continuously measured or known variables such as discharge, season, temperature, and time. For this project, developed in cooperation with QUEST (Quantifying Uncertainty in Ecosystem Studies), we evaluated uncertainty for three flux estimation methods and three different sampling frequencies (monthly, weekly, and weekly plus event). The constituents investigated were dissolved NO3, Si, SO4, and dissolved organic carbon (DOC), solutes whose concentration dynamics exhibit strongly contrasting behavior. The evaluation was completed for a 10-year period at five small, forested watersheds in Georgia, New Hampshire, New York, Puerto Rico, and Vermont. Concentration regression models were developed for each solute at each of the three sampling frequencies for all five watersheds. Fluxes were then calculated using (1) a linear interpolation approach, (2) a regression-model method, and (3) the composite method - which combines the regression-model method for estimating concentrations and the linear interpolation method for correcting model residuals to the observed sample concentrations. We considered the best estimates of flux to be derived using the composite method at the highest sampling frequencies. We also evaluated the importance of sampling frequency and estimation method on flux estimate uncertainty; flux uncertainty was dependent on the variability characteristics of each solute and varied for different reporting periods (e.g. 10-year, study period vs. annually vs. monthly). The usefulness of the two regression model based flux estimation approaches was dependent upon the amount of variance in concentrations the regression models could explain. Our results can guide the development of optimal sampling strategies by weighing sampling frequency with improvements in uncertainty in stream flux estimates for solutes with particular characteristics of variability. The appropriate flux estimation method is dependent on a combination of sampling frequency and the strength of concentration regression models. Sites: Biscuit Brook (Frost Valley, NY), Hubbard Brook Experimental Forest and LTER (West Thornton, NH), Luquillo Experimental Forest and LTER (Luquillo, Puerto Rico), Panola Mountain (Stockbridge, GA), Sleepers River Research Watershed (Danville, VT)

  2. Improving riverine constituent concentration and flux estimation by accounting for antecedent discharge conditions

    NASA Astrophysics Data System (ADS)

    Zhang, Qian; Ball, William P.

    2017-04-01

    Regression-based approaches are often employed to estimate riverine constituent concentrations and fluxes based on typically sparse concentration observations. One such approach is the recently developed WRTDS ("Weighted Regressions on Time, Discharge, and Season") method, which has been shown to provide more accurate estimates than prior approaches in a wide range of applications. Centered on WRTDS, this work was aimed at developing improved models for constituent concentration and flux estimation by accounting for antecedent discharge conditions. Twelve modified models were developed and tested, each of which contains one additional flow variable to represent antecedent conditions and which can be directly derived from the daily discharge record. High-resolution (∼daily) data at nine diverse monitoring sites were used to evaluate the relative merits of the models for estimation of six constituents - chloride (Cl), nitrate-plus-nitrite (NOx), total Kjeldahl nitrogen (TKN), total phosphorus (TP), soluble reactive phosphorus (SRP), and suspended sediment (SS). For each site-constituent combination, 30 concentration subsets were generated from the original data through Monte Carlo subsampling and then used to evaluate model performance. For the subsampling, three sampling strategies were adopted: (A) 1 random sample each month (12/year), (B) 12 random monthly samples plus additional 8 random samples per year (20/year), and (C) flow-stratified sampling with 12 regular (non-storm) and 8 storm samples per year (20/year). Results reveal that estimation performance varies with both model choice and sampling strategy. In terms of model choice, the modified models show general improvement over the original model under all three sampling strategies. Major improvements were achieved for NOx by the long-term flow-anomaly model and for Cl by the ADF (average discounted flow) model and the short-term flow-anomaly model. Moderate improvements were achieved for SS, TP, and TKN by the ADF model. By contrast, no such achievement was achieved for SRP by any proposed model. In terms of sampling strategy, performance of all models (including the original) was generally best using strategy C and worst using strategy A, and especially so for SS, TP, and SRP, confirming the value of routinely collecting stormflow samples. Overall, this work provides a comprehensive set of statistical evidence for supporting the incorporation of antecedent discharge conditions into the WRTDS model for estimation of constituent concentration and flux, thereby combining the advantages of two recent developments in water quality modeling.

  3. Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation

    NASA Astrophysics Data System (ADS)

    Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.

    2016-12-01

    With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.

  4. Sub-sampling genetic data to estimate black bear population size: A case study

    USGS Publications Warehouse

    Tredick, C.A.; Vaughan, M.R.; Stauffer, D.F.; Simek, S.L.; Eason, T.

    2007-01-01

    Costs for genetic analysis of hair samples collected for individual identification of bears average approximately US$50 [2004] per sample. This can easily exceed budgetary allowances for large-scale studies or studies of high-density bear populations. We used 2 genetic datasets from 2 areas in the southeastern United States to explore how reducing costs of analysis by sub-sampling affected precision and accuracy of resulting population estimates. We used several sub-sampling scenarios to create subsets of the full datasets and compared summary statistics, population estimates, and precision of estimates generated from these subsets to estimates generated from the complete datasets. Our results suggested that bias and precision of estimates improved as the proportion of total samples used increased, and heterogeneity models (e.g., Mh[CHAO]) were more robust to reduced sample sizes than other models (e.g., behavior models). We recommend that only high-quality samples (>5 hair follicles) be used when budgets are constrained, and efforts should be made to maximize capture and recapture rates in the field.

  5. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    NASA Astrophysics Data System (ADS)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  6. Post-stratification sampling in small area estimation (SAE) model for unemployment rate estimation by Bayes approach

    NASA Astrophysics Data System (ADS)

    Hanike, Yusrianti; Sadik, Kusman; Kurnia, Anang

    2016-02-01

    This research implemented unemployment rate in Indonesia that based on Poisson distribution. It would be estimated by modified the post-stratification and Small Area Estimation (SAE) model. Post-stratification was one of technique sampling that stratified after collected survey data. It's used when the survey data didn't serve for estimating the interest area. Interest area here was the education of unemployment which separated in seven category. The data was obtained by Labour Employment National survey (Sakernas) that's collected by company survey in Indonesia, BPS, Statistic Indonesia. This company served the national survey that gave too small sample for level district. Model of SAE was one of alternative to solved it. According the problem above, we combined this post-stratification sampling and SAE model. This research gave two main model of post-stratification sampling. Model I defined the category of education was the dummy variable and model II defined the category of education was the area random effect. Two model has problem wasn't complied by Poisson assumption. Using Poisson-Gamma model, model I has over dispersion problem was 1.23 solved to 0.91 chi square/df and model II has under dispersion problem was 0.35 solved to 0.94 chi square/df. Empirical Bayes was applied to estimate the proportion of every category education of unemployment. Using Bayesian Information Criteria (BIC), Model I has smaller mean square error (MSE) than model II.

  7. An Improved Nested Sampling Algorithm for Model Selection and Assessment

    NASA Astrophysics Data System (ADS)

    Zeng, X.; Ye, M.; Wu, J.; WANG, D.

    2017-12-01

    Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.

  8. Estimating thermal performance curves from repeated field observations

    USGS Publications Warehouse

    Childress, Evan; Letcher, Benjamin H.

    2017-01-01

    Estimating thermal performance of organisms is critical for understanding population distributions and dynamics and predicting responses to climate change. Typically, performance curves are estimated using laboratory studies to isolate temperature effects, but other abiotic and biotic factors influence temperature-performance relationships in nature reducing these models' predictive ability. We present a model for estimating thermal performance curves from repeated field observations that includes environmental and individual variation. We fit the model in a Bayesian framework using MCMC sampling, which allowed for estimation of unobserved latent growth while propagating uncertainty. Fitting the model to simulated data varying in sampling design and parameter values demonstrated that the parameter estimates were accurate, precise, and unbiased. Fitting the model to individual growth data from wild trout revealed high out-of-sample predictive ability relative to laboratory-derived models, which produced more biased predictions for field performance. The field-based estimates of thermal maxima were lower than those based on laboratory studies. Under warming temperature scenarios, field-derived performance models predicted stronger declines in body size than laboratory-derived models, suggesting that laboratory-based models may underestimate climate change effects. The presented model estimates true, realized field performance, avoiding assumptions required for applying laboratory-based models to field performance, which should improve estimates of performance under climate change and advance thermal ecology.

  9. Sample Size and Item Parameter Estimation Precision When Utilizing the One-Parameter "Rasch" Model

    ERIC Educational Resources Information Center

    Custer, Michael

    2015-01-01

    This study examines the relationship between sample size and item parameter estimation precision when utilizing the one-parameter model. Item parameter estimates are examined relative to "true" values by evaluating the decline in root mean squared deviation (RMSD) and the number of outliers as sample size increases. This occurs across…

  10. Estimation in a discrete tail rate family of recapture sampling models

    NASA Technical Reports Server (NTRS)

    Gupta, Rajan; Lee, Larry D.

    1990-01-01

    In the context of recapture sampling design for debugging experiments the problem of estimating the error or hitting rate of the faults remaining in a system is considered. Moment estimators are derived for a family of models in which the rate parameters are assumed proportional to the tail probabilities of a discrete distribution on the positive integers. The estimators are shown to be asymptotically normal and fully efficient. Their fixed sample properties are compared, through simulation, with those of the conditional maximum likelihood estimators.

  11. Model-based inference for small area estimation with sampling weights

    PubMed Central

    Vandendijck, Y.; Faes, C.; Kirby, R.S.; Lawson, A.; Hens, N.

    2017-01-01

    Obtaining reliable estimates about health outcomes for areas or domains where only few to no samples are available is the goal of small area estimation (SAE). Often, we rely on health surveys to obtain information about health outcomes. Such surveys are often characterised by a complex design, stratification, and unequal sampling weights as common features. Hierarchical Bayesian models are well recognised in SAE as a spatial smoothing method, but often ignore the sampling weights that reflect the complex sampling design. In this paper, we focus on data obtained from a health survey where the sampling weights of the sampled individuals are the only information available about the design. We develop a predictive model-based approach to estimate the prevalence of a binary outcome for both the sampled and non-sampled individuals, using hierarchical Bayesian models that take into account the sampling weights. A simulation study is carried out to compare the performance of our proposed method with other established methods. The results indicate that our proposed method achieves great reductions in mean squared error when compared with standard approaches. It performs equally well or better when compared with more elaborate methods when there is a relationship between the responses and the sampling weights. The proposed method is applied to estimate asthma prevalence across districts. PMID:28989860

  12. A hierarchical model for spatial capture-recapture data

    USGS Publications Warehouse

    Royle, J. Andrew; Young, K.V.

    2008-01-01

    Estimating density is a fundamental objective of many animal population studies. Application of methods for estimating population size from ostensibly closed populations is widespread, but ineffective for estimating absolute density because most populations are subject to short-term movements or so-called temporary emigration. This phenomenon invalidates the resulting estimates because the effective sample area is unknown. A number of methods involving the adjustment of estimates based on heuristic considerations are in widespread use. In this paper, a hierarchical model of spatially indexed capture recapture data is proposed for sampling based on area searches of spatial sample units subject to uniform sampling intensity. The hierarchical model contains explicit models for the distribution of individuals and their movements, in addition to an observation model that is conditional on the location of individuals during sampling. Bayesian analysis of the hierarchical model is achieved by the use of data augmentation, which allows for a straightforward implementation in the freely available software WinBUGS. We present results of a simulation study that was carried out to evaluate the operating characteristics of the Bayesian estimator under variable densities and movement patterns of individuals. An application of the model is presented for survey data on the flat-tailed horned lizard (Phrynosoma mcallii) in Arizona, USA.

  13. Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Thompson, Bruce; Fan, Xitao

    This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…

  14. Improving Riverine Constituent Concentration and Flux Estimation by Accounting for Antecedent Discharge Conditions

    NASA Astrophysics Data System (ADS)

    Zhang, Q.; Ball, W. P.

    2016-12-01

    Regression-based approaches are often employed to estimate riverine constituent concentrations and fluxes based on typically sparse concentration observations. One such approach is the WRTDS ("Weighted Regressions on Time, Discharge, and Season") method, which has been shown to provide more accurate estimates than prior approaches. Centered on WRTDS, this work was aimed at developing improved models for constituent concentration and flux estimation by accounting for antecedent discharge conditions. Twelve modified models were developed and tested, each of which contains one additional variable to represent antecedent conditions. High-resolution ( daily) data at nine monitoring sites were used to evaluate the relative merits of the models for estimation of six constituents - chloride (Cl), nitrate-plus-nitrite (NOx), total Kjeldahl nitrogen (TKN), total phosphorus (TP), soluble reactive phosphorus (SRP), and suspended sediment (SS). For each site-constituent combination, 30 concentration subsets were generated from the original data through Monte Carlo sub-sampling and then used to evaluate model performance. For the sub-sampling, three sampling strategies were adopted: (A) 1 random sample each month (12/year), (B) 12 random monthly samples plus additional 8 random samples per year (20/year), and (C) 12 regular (non-storm) and 8 storm samples per year (20/year). The modified models show general improvement over the original model under all three sampling strategies. Major improvements were achieved for NOx by the long-term flow-anomaly model and for Cl by the ADF (average discounted flow) model and the short-term flow-anomaly model. Moderate improvements were achieved for SS, TP, and TKN by the ADF model. By contrast, no such achievement was achieved for SRP by any proposed model. In terms of sampling strategy, performance of all models was generally best using strategy C and worst using strategy A, and especially so for SS, TP, and SRP, confirming the value of routinely collecting storm-flow samples. Overall, this work provides a comprehensive set of statistical evidence for supporting the incorporation of antecedent discharge conditions into WRTDS for constituent concentration and flux estimation, thereby combining the advantages of two recent developments in water quality modeling.

  15. Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

    PubMed Central

    Karcher, Michael D.; Palacios, Julia A.; Bedford, Trevor; Suchard, Marc A.; Minin, Vladimir N.

    2016-01-01

    Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples. PMID:26938243

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertholon, François; Harant, Olivier; Bourlon, Bertrand

    This article introduces a joined Bayesian estimation of gas samples issued from a gas chromatography column (GC) coupled with a NEMS sensor based on Giddings Eyring microscopic molecular stochastic model. The posterior distribution is sampled using a Monte Carlo Markov Chain and Gibbs sampling. Parameters are estimated using the posterior mean. This estimation scheme is finally applied on simulated and real datasets using this molecular stochastic forward model.

  17. Estimating species - area relationships by modeling abundance and frequency subject to incomplete sampling.

    PubMed

    Yamaura, Yuichi; Connor, Edward F; Royle, J Andrew; Itoh, Katsuo; Sato, Kiyoshi; Taki, Hisatomo; Mishima, Yoshio

    2016-07-01

    Models and data used to describe species-area relationships confound sampling with ecological process as they fail to acknowledge that estimates of species richness arise due to sampling. This compromises our ability to make ecological inferences from and about species-area relationships. We develop and illustrate hierarchical community models of abundance and frequency to estimate species richness. The models we propose separate sampling from ecological processes by explicitly accounting for the fact that sampled patches are seldom completely covered by sampling plots and that individuals present in the sampling plots are imperfectly detected. We propose a multispecies abundance model in which community assembly is treated as the summation of an ensemble of species-level Poisson processes and estimate patch-level species richness as a derived parameter. We use sampling process models appropriate for specific survey methods. We propose a multispecies frequency model that treats the number of plots in which a species occurs as a binomial process. We illustrate these models using data collected in surveys of early-successional bird species and plants in young forest plantation patches. Results indicate that only mature forest plant species deviated from the constant density hypothesis, but the null model suggested that the deviations were too small to alter the form of species-area relationships. Nevertheless, results from simulations clearly show that the aggregate pattern of individual species density-area relationships and occurrence probability-area relationships can alter the form of species-area relationships. The plant community model estimated that only half of the species present in the regional species pool were encountered during the survey. The modeling framework we propose explicitly accounts for sampling processes so that ecological processes can be examined free of sampling artefacts. Our modeling approach is extensible and could be applied to a variety of study designs and allows the inclusion of additional environmental covariates.

  18. Estimating species – area relationships by modeling abundance and frequency subject to incomplete sampling

    USGS Publications Warehouse

    Yamaura, Yuichi; Connor, Edward F.; Royle, Andy; Itoh, Katsuo; Sato, Kiyoshi; Taki, Hisatomo; Mishima, Yoshio

    2016-01-01

    Models and data used to describe species–area relationships confound sampling with ecological process as they fail to acknowledge that estimates of species richness arise due to sampling. This compromises our ability to make ecological inferences from and about species–area relationships. We develop and illustrate hierarchical community models of abundance and frequency to estimate species richness. The models we propose separate sampling from ecological processes by explicitly accounting for the fact that sampled patches are seldom completely covered by sampling plots and that individuals present in the sampling plots are imperfectly detected. We propose a multispecies abundance model in which community assembly is treated as the summation of an ensemble of species-level Poisson processes and estimate patch-level species richness as a derived parameter. We use sampling process models appropriate for specific survey methods. We propose a multispecies frequency model that treats the number of plots in which a species occurs as a binomial process. We illustrate these models using data collected in surveys of early-successional bird species and plants in young forest plantation patches. Results indicate that only mature forest plant species deviated from the constant density hypothesis, but the null model suggested that the deviations were too small to alter the form of species–area relationships. Nevertheless, results from simulations clearly show that the aggregate pattern of individual species density–area relationships and occurrence probability–area relationships can alter the form of species–area relationships. The plant community model estimated that only half of the species present in the regional species pool were encountered during the survey. The modeling framework we propose explicitly accounts for sampling processes so that ecological processes can be examined free of sampling artefacts. Our modeling approach is extensible and could be applied to a variety of study designs and allows the inclusion of additional environmental covariates.

  19. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples

    EPA Science Inventory

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...

  20. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples..

    EPA Science Inventory

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...

  1. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples.

    EPA Science Inventory

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...

  2. Population pharmacokinetic characterization of BAY 81-8973, a full-length recombinant factor VIII: lessons learned - importance of including samples with factor VIII levels below the quantitation limit.

    PubMed

    Garmann, D; McLeay, S; Shah, A; Vis, P; Maas Enriquez, M; Ploeger, B A

    2017-07-01

    The pharmacokinetics (PK), safety and efficacy of BAY 81-8973, a full-length, unmodified, recombinant human factor VIII (FVIII), were evaluated in the LEOPOLD trials. The aim of this study was to develop a population PK model based on pooled data from the LEOPOLD trials and to investigate the importance of including samples with FVIII levels below the limit of quantitation (BLQ) to estimate half-life. The analysis included 1535 PK observations (measured by the chromogenic assay) from 183 male patients with haemophilia A aged 1-61 years from the 3 LEOPOLD trials. The limit of quantitation was 1.5 IU dL -1 for the majority of samples. Population PK models that included or excluded BLQ samples were used for FVIII half-life estimations, and simulations were performed using both estimates to explore the influence on the time below a determined FVIII threshold. In the data set used, approximately 16.5% of samples were BLQ, which is not uncommon for FVIII PK data sets. The structural model to describe the PK of BAY 81-8973 was a two-compartment model similar to that seen for other FVIII products. If BLQ samples were excluded from the model, FVIII half-life estimations were longer compared with a model that included BLQ samples. It is essential to assess the importance of BLQ samples when performing population PK estimates of half-life for any FVIII product. Exclusion of BLQ data from half-life estimations based on population PK models may result in an overestimation of half-life and underestimation of time under a predetermined FVIII threshold, resulting in potential underdosing of patients. © 2017 Bayer AG. Haemophilia Published by John Wiley & Sons Ltd.

  3. Accounting for Incomplete Species Detection in Fish Community Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McManamay, Ryan A; Orth, Dr. Donald J; Jager, Yetta

    2013-01-01

    Riverine fish assemblages are heterogeneous and very difficult to characterize with a one-size-fits-all approach to sampling. Furthermore, detecting changes in fish assemblages over time requires accounting for variation in sampling designs. We present a modeling approach that permits heterogeneous sampling by accounting for site and sampling covariates (including method) in a model-based framework for estimation (versus a sampling-based framework). We snorkeled during three surveys and electrofished during a single survey in suite of delineated habitats stratified by reach types. We developed single-species occupancy models to determine covariates influencing patch occupancy and species detection probabilities whereas community occupancy models estimated speciesmore » richness in light of incomplete detections. For most species, information-theoretic criteria showed higher support for models that included patch size and reach as covariates of occupancy. In addition, models including patch size and sampling method as covariates of detection probabilities also had higher support. Detection probability estimates for snorkeling surveys were higher for larger non-benthic species whereas electrofishing was more effective at detecting smaller benthic species. The number of sites and sampling occasions required to accurately estimate occupancy varied among fish species. For rare benthic species, our results suggested that higher number of occasions, and especially the addition of electrofishing, may be required to improve detection probabilities and obtain accurate occupancy estimates. Community models suggested that richness was 41% higher than the number of species actually observed and the addition of an electrofishing survey increased estimated richness by 13%. These results can be useful to future fish assemblage monitoring efforts by informing sampling designs, such as site selection (e.g. stratifying based on patch size) and determining effort required (e.g. number of sites versus occasions).« less

  4. Combining band recovery data and Pollock's robust design to model temporary and permanent emigration

    USGS Publications Warehouse

    Lindberg, M.S.; Kendall, W.L.; Hines, J.E.; Anderson, M.G.

    2001-01-01

    Capture-recapture models are widely used to estimate demographic parameters of marked populations. Recently, this statistical theory has been extended to modeling dispersal of open populations. Multistate models can be used to estimate movement probabilities among subdivided populations if multiple sites are sampled. Frequently, however, sampling is limited to a single site. Models described by Burnham (1993, in Marked Individuals in the Study of Bird Populations, 199-213), which combined open population capture-recapture and band-recovery models, can be used to estimate permanent emigration when sampling is limited to a single population. Similarly, Kendall, Nichols, and Hines (1997, Ecology 51, 563-578) developed models to estimate temporary emigration under Pollock's (1982, Journal of Wildlife Management 46, 757-760) robust design. We describe a likelihood-based approach to simultaneously estimate temporary and permanent emigration when sampling is limited to a single population. We use a sampling design that combines the robust design and recoveries of individuals obtained immediately following each sampling period. We present a general form for our model where temporary emigration is a first-order Markov process, and we discuss more restrictive models. We illustrate these models with analysis of data on marked Canvasback ducks. Our analysis indicates that probability of permanent emigration for adult female Canvasbacks was 0.193 (SE = 0.082) and that birds that were present at the study area in year i - 1 had a higher probability of presence in year i than birds that were not present in year i - 1.

  5. The Influence of Mark-Recapture Sampling Effort on Estimates of Rock Lobster Survival

    PubMed Central

    Kordjazi, Ziya; Frusher, Stewart; Buxton, Colin; Gardner, Caleb; Bird, Tomas

    2016-01-01

    Five annual capture-mark-recapture surveys on Jasus edwardsii were used to evaluate the effect of sample size and fishing effort on the precision of estimated survival probability. Datasets of different numbers of individual lobsters (ranging from 200 to 1,000 lobsters) were created by random subsampling from each annual survey. This process of random subsampling was also used to create 12 datasets of different levels of effort based on three levels of the number of traps (15, 30 and 50 traps per day) and four levels of the number of sampling-days (2, 4, 6 and 7 days). The most parsimonious Cormack-Jolly-Seber (CJS) model for estimating survival probability shifted from a constant model towards sex-dependent models with increasing sample size and effort. A sample of 500 lobsters or 50 traps used on four consecutive sampling-days was required for obtaining precise survival estimations for males and females, separately. Reduced sampling effort of 30 traps over four sampling days was sufficient if a survival estimate for both sexes combined was sufficient for management of the fishery. PMID:26990561

  6. An open-population hierarchical distance sampling model

    USGS Publications Warehouse

    Sollmann, Rachel; Beth Gardner,; Richard B Chandler,; Royle, J. Andrew; T Scott Sillett,

    2015-01-01

    Modeling population dynamics while accounting for imperfect detection is essential to monitoring programs. Distance sampling allows estimating population size while accounting for imperfect detection, but existing methods do not allow for direct estimation of demographic parameters. We develop a model that uses temporal correlation in abundance arising from underlying population dynamics to estimate demographic parameters from repeated distance sampling surveys. Using a simulation study motivated by designing a monitoring program for island scrub-jays (Aphelocoma insularis), we investigated the power of this model to detect population trends. We generated temporally autocorrelated abundance and distance sampling data over six surveys, using population rates of change of 0.95 and 0.90. We fit the data generating Markovian model and a mis-specified model with a log-linear time effect on abundance, and derived post hoc trend estimates from a model estimating abundance for each survey separately. We performed these analyses for varying number of survey points. Power to detect population changes was consistently greater under the Markov model than under the alternatives, particularly for reduced numbers of survey points. The model can readily be extended to more complex demographic processes than considered in our simulations. This novel framework can be widely adopted for wildlife population monitoring.

  7. An open-population hierarchical distance sampling model.

    PubMed

    Sollmann, Rahel; Gardner, Beth; Chandler, Richard B; Royle, J Andrew; Sillett, T Scott

    2015-02-01

    Modeling population dynamics while accounting for imperfect detection is essential to monitoring programs. Distance sampling allows estimating population size while accounting for imperfect detection, but existing methods do not allow for estimation of demographic parameters. We develop a model that uses temporal correlation in abundance arising from underlying population dynamics to estimate demographic parameters from repeated distance sampling surveys. Using a simulation study motivated by designing a monitoring program for Island Scrub-Jays (Aphelocoma insularis), we investigated the power of this model to detect population trends. We generated temporally autocorrelated abundance and distance sampling data over six surveys, using population rates of change of 0.95 and 0.90. We fit the data generating Markovian model and a mis-specified model with a log-linear time effect on abundance, and derived post hoc trend estimates from a model estimating abundance for each survey separately. We performed these analyses for varying numbers of survey points. Power to detect population changes was consistently greater under the Markov model than under the alternatives, particularly for reduced numbers of survey points. The model can readily be extended to more complex demographic processes than considered in our simulations. This novel framework can be widely adopted for wildlife population monitoring.

  8. Sampling through time and phylodynamic inference with coalescent and birth–death models

    PubMed Central

    Volz, Erik M.; Frost, Simon D. W.

    2014-01-01

    Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth–death-sampling model (BDM), in the context of estimating population size and birth rates in a population growing exponentially according to the birth–death branching process. For sequences sampled at a single time, we found the coalescent and the BDM gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth–death model estimators are subject to large bias if the sampling process is misspecified. Since BDMs incorporate a model of the sampling process, we show how much of the statistical power of BDMs arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information. PMID:25401173

  9. Development and verification of a model for estimating the screening utility in the detection of PCBs in transformer oil.

    PubMed

    Terakado, Shingo; Glass, Thomas R; Sasaki, Kazuhiro; Ohmura, Naoya

    2014-01-01

    A simple new model for estimating the screening performance (false positive and false negative rates) of a given test for a specific sample population is presented. The model is shown to give good results on a test population, and is used to estimate the performance on a sampled population. Using the model developed in conjunction with regulatory requirements and the relative costs of the confirmatory and screening tests allows evaluation of the screening test's utility in terms of cost savings. Testers can use the methods developed to estimate the utility of a screening program using available screening tests with their own sample populations.

  10. WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING

    PubMed Central

    Saegusa, Takumi; Wellner, Jon A.

    2013-01-01

    We develop asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement. We also consider several variants of WLEs involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko–Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a finite-dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is estimable either at regular or nonregular rates. We illustrate these results and methods in the Cox model with right censoring and interval censoring. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase. PMID:24563559

  11. A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling

    PubMed Central

    Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A.

    2012-01-01

    This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. PMID:23843659

  12. Mars Rover/Sample Return - Phase A cost estimation

    NASA Technical Reports Server (NTRS)

    Stancati, Michael L.; Spadoni, Daniel J.

    1990-01-01

    This paper presents a preliminary cost estimate for the design and development of the Mars Rover/Sample Return (MRSR) mission. The estimate was generated using a modeling tool specifically built to provide useful cost estimates from design parameters of the type and fidelity usually available during early phases of mission design. The model approach and its application to MRSR are described.

  13. Numerical Demons in Monte Carlo Estimation of Bayesian Model Evidence with Application to Soil Respiration Models

    NASA Astrophysics Data System (ADS)

    Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.

    2016-12-01

    Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.

  14. Stemflow estimation in a redwood forest using model-based stratified random sampling

    Treesearch

    Jack Lewis

    2003-01-01

    Model-based stratified sampling is illustrated by a case study of stemflow volume in a redwood forest. The approach is actually a model-assisted sampling design in which auxiliary information (tree diameter) is utilized in the design of stratum boundaries to optimize the efficiency of a regression or ratio estimator. The auxiliary information is utilized in both the...

  15. System health monitoring using multiple-model adaptive estimation techniques

    NASA Astrophysics Data System (ADS)

    Sifford, Stanley Ryan

    Monitoring system health for fault detection and diagnosis by tracking system parameters concurrently with state estimates is approached using a new multiple-model adaptive estimation (MMAE) method. This novel method is called GRid-based Adaptive Parameter Estimation (GRAPE). GRAPE expands existing MMAE methods by using new techniques to sample the parameter space. GRAPE expands on MMAE with the hypothesis that sample models can be applied and resampled without relying on a predefined set of models. GRAPE is initially implemented in a linear framework using Kalman filter models. A more generalized GRAPE formulation is presented using extended Kalman filter (EKF) models to represent nonlinear systems. GRAPE can handle both time invariant and time varying systems as it is designed to track parameter changes. Two techniques are presented to generate parameter samples for the parallel filter models. The first approach is called selected grid-based stratification (SGBS). SGBS divides the parameter space into equally spaced strata. The second approach uses Latin Hypercube Sampling (LHS) to determine the parameter locations and minimize the total number of required models. LHS is particularly useful when the parameter dimensions grow. Adding more parameters does not require the model count to increase for LHS. Each resample is independent of the prior sample set other than the location of the parameter estimate. SGBS and LHS can be used for both the initial sample and subsequent resamples. Furthermore, resamples are not required to use the same technique. Both techniques are demonstrated for both linear and nonlinear frameworks. The GRAPE framework further formalizes the parameter tracking process through a general approach for nonlinear systems. These additional methods allow GRAPE to either narrow the focus to converged values within a parameter range or expand the range in the appropriate direction to track the parameters outside the current parameter range boundary. Customizable rules define the specific resample behavior when the GRAPE parameter estimates converge. Convergence itself is determined from the derivatives of the parameter estimates using a simple moving average window to filter out noise. The system can be tuned to match the desired performance goals by making adjustments to parameters such as the sample size, convergence criteria, resample criteria, initial sampling method, resampling method, confidence in prior sample covariances, sample delay, and others.

  16. Unified framework to evaluate panmixia and migration direction among multiple sampling locations.

    PubMed

    Beerli, Peter; Palczewski, Michal

    2010-05-01

    For many biological investigations, groups of individuals are genetically sampled from several geographic locations. These sampling locations often do not reflect the genetic population structure. We describe a framework using marginal likelihoods to compare and order structured population models, such as testing whether the sampling locations belong to the same randomly mating population or comparing unidirectional and multidirectional gene flow models. In the context of inferences employing Markov chain Monte Carlo methods, the accuracy of the marginal likelihoods depends heavily on the approximation method used to calculate the marginal likelihood. Two methods, modified thermodynamic integration and a stabilized harmonic mean estimator, are compared. With finite Markov chain Monte Carlo run lengths, the harmonic mean estimator may not be consistent. Thermodynamic integration, in contrast, delivers considerably better estimates of the marginal likelihood. The choice of prior distributions does not influence the order and choice of the better models when the marginal likelihood is estimated using thermodynamic integration, whereas with the harmonic mean estimator the influence of the prior is pronounced and the order of the models changes. The approximation of marginal likelihood using thermodynamic integration in MIGRATE allows the evaluation of complex population genetic models, not only of whether sampling locations belong to a single panmictic population, but also of competing complex structured population models.

  17. A re-evaluation of a case-control model with contaminated controls for resource selection studies

    Treesearch

    Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski

    2013-01-01

    A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...

  18. Hierarchical model analysis of the Atlantic Flyway Breeding Waterfowl Survey

    USGS Publications Warehouse

    Sauer, John R.; Zimmerman, Guthrie S.; Klimstra, Jon D.; Link, William A.

    2014-01-01

    We used log-linear hierarchical models to analyze data from the Atlantic Flyway Breeding Waterfowl Survey. The survey has been conducted by state biologists each year since 1989 in the northeastern United States from Virginia north to New Hampshire and Vermont. Although yearly population estimates from the survey are used by the United States Fish and Wildlife Service for estimating regional waterfowl population status for mallards (Anas platyrhynchos), black ducks (Anas rubripes), wood ducks (Aix sponsa), and Canada geese (Branta canadensis), they are not routinely adjusted to control for time of day effects and other survey design issues. The hierarchical model analysis permits estimation of year effects and population change while accommodating the repeated sampling of plots and controlling for time of day effects in counting. We compared population estimates from the current stratified random sample analysis to population estimates from hierarchical models with alternative model structures that describe year to year changes as random year effects, a trend with random year effects, or year effects modeled as 1-year differences. Patterns of population change from the hierarchical model results generally were similar to the patterns described by stratified random sample estimates, but significant visibility differences occurred between twilight to midday counts in all species. Controlling for the effects of time of day resulted in larger population estimates for all species in the hierarchical model analysis relative to the stratified random sample analysis. The hierarchical models also provided a convenient means of estimating population trend as derived statistics from the analysis. We detected significant declines in mallard and American black ducks and significant increases in wood ducks and Canada geese, a trend that had not been significant for 3 of these 4 species in the prior analysis. We recommend using hierarchical models for analysis of the Atlantic Flyway Breeding Waterfowl Survey.

  19. Estimation of signal-dependent noise level function in transform domain via a sparse recovery model.

    PubMed

    Yang, Jingyu; Gan, Ziqiao; Wu, Zhaoyang; Hou, Chunping

    2015-05-01

    This paper proposes a novel algorithm to estimate the noise level function (NLF) of signal-dependent noise (SDN) from a single image based on the sparse representation of NLFs. Noise level samples are estimated from the high-frequency discrete cosine transform (DCT) coefficients of nonlocal-grouped low-variation image patches. Then, an NLF recovery model based on the sparse representation of NLFs under a trained basis is constructed to recover NLF from the incomplete noise level samples. Confidence levels of the NLF samples are incorporated into the proposed model to promote reliable samples and weaken unreliable ones. We investigate the behavior of the estimation performance with respect to the block size, sampling rate, and confidence weighting. Simulation results on synthetic noisy images show that our method outperforms existing state-of-the-art schemes. The proposed method is evaluated on real noisy images captured by three types of commodity imaging devices, and shows consistently excellent SDN estimation performance. The estimated NLFs are incorporated into two well-known denoising schemes, nonlocal means and BM3D, and show significant improvements in denoising SDN-polluted images.

  20. Profile-likelihood Confidence Intervals in Item Response Theory Models.

    PubMed

    Chalmers, R Philip; Pek, Jolynn; Liu, Yang

    2017-01-01

    Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.

  1. The impact of transport model differences on CO2 surface flux estimates from OCO-2 retrievals of column average CO2

    NASA Astrophysics Data System (ADS)

    Basu, Sourish; Baker, David F.; Chevallier, Frédéric; Patra, Prabir K.; Liu, Junjie; Miller, John B.

    2018-05-01

    We estimate the uncertainty of CO2 flux estimates in atmospheric inversions stemming from differences between different global transport models. Using a set of observing system simulation experiments (OSSEs), we estimate this uncertainty as represented by the spread between five different state-of-the-art global transport models (ACTM, LMDZ, GEOS-Chem, PCTM and TM5), for both traditional in situ CO2 inversions and inversions of XCO2 estimates from the Orbiting Carbon Observatory 2 (OCO-2). We find that, in the absence of relative biases between in situ CO2 and OCO-2 XCO2, OCO-2 estimates of terrestrial flux for TRANSCOM-scale land regions can be more robust to transport model differences than corresponding in situ CO2 inversions. This is due to a combination of the increased spatial coverage of OCO-2 samples and the total column nature of OCO-2 estimates. We separate the two effects by constructing hypothetical in situ networks with the coverage of OCO-2 but with only near-surface samples. We also find that the transport-driven uncertainty in fluxes is comparable between well-sampled northern temperate regions and poorly sampled tropical regions. Furthermore, we find that spatiotemporal differences in sampling, such as between OCO-2 land and ocean soundings, coupled with imperfect transport, can produce differences in flux estimates that are larger than flux uncertainties due to transport model differences. This highlights the need for sampling with as complete a spatial and temporal coverage as possible (e.g., using both land and ocean retrievals together for OCO-2) to minimize the impact of selective sampling. Finally, our annual and monthly estimates of transport-driven uncertainties can be used to evaluate the robustness of conclusions drawn from real OCO-2 and in situ CO2 inversions.

  2. Estimating species richness and accumulation by modeling species occurrence and detectability

    USGS Publications Warehouse

    Dorazio, R.M.; Royle, J. Andrew; Soderstrom, B.; Glimskarc, A.

    2006-01-01

    A statistical model is developed for estimating species richness and accumulation by formulating these community-level attributes as functions of model-based estimators of species occurrence while accounting for imperfect detection of individual species. The model requires a sampling protocol wherein repeated observations are made at a collection of sample locations selected to be representative of the community. This temporal replication provides the data needed to resolve the ambiguity between species absence and nondetection when species are unobserved at sample locations. Estimates of species richness and accumulation are computed for two communities, an avian community and a butterfly community. Our model-based estimates suggest that detection failures in many bird species were attributed to low rates of occurrence, as opposed to simply low rates of detection. We estimate that the avian community contains a substantial number of uncommon species and that species richness greatly exceeds the number of species actually observed in the sample. In fact, predictions of species accumulation suggest that even doubling the number of sample locations would not have revealed all of the species in the community. In contrast, our analysis of the butterfly community suggests that many species are relatively common and that the estimated richness of species in the community is nearly equal to the number of species actually detected in the sample. Our predictions of species accumulation suggest that the number of sample locations actually used in the butterfly survey could have been cut in half and the asymptotic richness of species still would have been attained. Our approach of developing occurrence-based summaries of communities while allowing for imperfect detection of species is broadly applicable and should prove useful in the design and analysis of surveys of biodiversity.

  3. A Portuguese value set for the SF-6D.

    PubMed

    Ferreira, Lara N; Ferreira, Pedro L; Pereira, Luis N; Brazier, John; Rowen, Donna

    2010-08-01

    The SF-6D is a preference-based measure of health derived from the SF-36 that can be used for cost-effectiveness analysis using cost-per-quality adjusted life-year analysis. This study seeks to estimate a system weight for the SF-6D for Portugal and to compare the results with the UK system weights. A sample of 55 health states defined by the SF-6D has been valued by a representative random sample of the Portuguese population, stratified by sex and age (n = 140), using the Standard Gamble (SG). Several models are estimated at both the individual and aggregate levels for predicting health-state valuations. Models with main effects, with interaction effects and with the constant forced to unity are presented. Random effects (RE) models are estimated using generalized least squares (GLS) regressions. Generalized estimation equations (GEE) are used to estimate RE models with the constant forced to unity. Estimations at the individual level were performed using 630 health-state valuations. Alternative functional forms are considered to account for the skewed distribution of health-state valuations. The models are analyzed in terms of their coefficients, overall fit, and the ability for predicting the SG-values. The RE models estimated using GLS and through GEE produce significant coefficients, which are robust across model specification. However, there are concerns regarding some inconsistent estimates, and so parsimonious consistent models were estimated. There is evidence of under prediction in some states assigned to poor health. The results are consistent with the UK results. The models estimated provide preference-based quality of life weights for the Portuguese population when health status data have been collected using the SF-36. Although the sample was randomly drowned findings should be treated with caution, given the small sample size, even knowing that they have been estimated at the individual level.

  4. Pharmacokinetic Studies in Neonates: The Utility of an Opportunistic Sampling Design.

    PubMed

    Leroux, Stéphanie; Turner, Mark A; Guellec, Chantal Barin-Le; Hill, Helen; van den Anker, Johannes N; Kearns, Gregory L; Jacqz-Aigrain, Evelyne; Zhao, Wei

    2015-12-01

    The use of an opportunistic (also called scavenged) sampling strategy in a prospective pharmacokinetic study combined with population pharmacokinetic modelling has been proposed as an alternative strategy to conventional methods for accomplishing pharmacokinetic studies in neonates. However, the reliability of this approach in this particular paediatric population has not been evaluated. The objective of the present study was to evaluate the performance of an opportunistic sampling strategy for a population pharmacokinetic estimation, as well as dose prediction, and compare this strategy with a predetermined pharmacokinetic sampling approach. Three population pharmacokinetic models were derived for ciprofloxacin from opportunistic blood samples (SC model), predetermined (i.e. scheduled) samples (TR model) and all samples (full model used to previously characterize ciprofloxacin pharmacokinetics), using NONMEM software. The predictive performance of developed models was evaluated in an independent group of patients. Pharmacokinetic data from 60 newborns were obtained with a total of 430 samples available for analysis; 265 collected at predetermined times and 165 that were scavenged from those obtained as part of clinical care. All datasets were fit using a two-compartment model with first-order elimination. The SC model could identify the most significant covariates and provided reasonable estimates of population pharmacokinetic parameters (clearance and steady-state volume of distribution) compared with the TR and full models. Their predictive performances were further confirmed in an external validation by Bayesian estimation, and showed similar results. Monte Carlo simulation based on area under the concentration-time curve from zero to 24 h (AUC24)/minimum inhibitory concentration (MIC) using either the SC or the TR model gave similar dose prediction for ciprofloxacin. Blood samples scavenged in the course of caring for neonates can be used to estimate ciprofloxacin pharmacokinetic parameters and therapeutic dose requirements.

  5. The Effects of Model Misspecification and Sample Size on LISREL Maximum Likelihood Estimates.

    ERIC Educational Resources Information Center

    Baldwin, Beatrice

    The robustness of LISREL computer program maximum likelihood estimates under specific conditions of model misspecification and sample size was examined. The population model used in this study contains one exogenous variable; three endogenous variables; and eight indicator variables, two for each latent variable. Conditions of model…

  6. Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling

    NASA Astrophysics Data System (ADS)

    Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

    2017-04-01

    Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.

  7. Nonparametric Transfer Function Models

    PubMed Central

    Liu, Jun M.; Chen, Rong; Yao, Qiwei

    2009-01-01

    In this paper a class of nonparametric transfer function models is proposed to model nonlinear relationships between ‘input’ and ‘output’ time series. The transfer function is smooth with unknown functional forms, and the noise is assumed to be a stationary autoregressive-moving average (ARMA) process. The nonparametric transfer function is estimated jointly with the ARMA parameters. By modeling the correlation in the noise, the transfer function can be estimated more efficiently. The parsimonious ARMA structure improves the estimation efficiency in finite samples. The asymptotic properties of the estimators are investigated. The finite-sample properties are illustrated through simulations and one empirical example. PMID:20628584

  8. The Evaluation of Bias of the Weighted Random Effects Model Estimators. Research Report. ETS RR-11-13

    ERIC Educational Resources Information Center

    Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan

    2011-01-01

    Estimation of parameters of random effects models from samples collected via complex multistage designs is considered. One way to reduce estimation bias due to unequal probabilities of selection is to incorporate sampling weights. Many researchers have been proposed various weighting methods (Korn, & Graubard, 2003; Pfeffermann, Skinner,…

  9. Estimating the Expected Value of Sample Information Using the Probabilistic Sensitivity Analysis Sample

    PubMed Central

    Oakley, Jeremy E.; Brennan, Alan; Breeze, Penny

    2015-01-01

    Health economic decision-analytic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value to the decision maker of reducing uncertainty through collecting new data. In the context of a particular decision problem, the value of a proposed research design can be quantified by its expected value of sample information (EVSI). EVSI is commonly estimated via a 2-level Monte Carlo procedure in which plausible data sets are generated in an outer loop, and then, conditional on these, the parameters of the decision model are updated via Bayes rule and sampled in an inner loop. At each iteration of the inner loop, the decision model is evaluated. This is computationally demanding and may be difficult if the posterior distribution of the model parameters conditional on sampled data is hard to sample from. We describe a fast nonparametric regression-based method for estimating per-patient EVSI that requires only the probabilistic sensitivity analysis sample (i.e., the set of samples drawn from the joint distribution of the parameters and the corresponding net benefits). The method avoids the need to sample from the posterior distributions of the parameters and avoids the need to rerun the model. The only requirement is that sample data sets can be generated. The method is applicable with a model of any complexity and with any specification of model parameter distribution. We demonstrate in a case study the superior efficiency of the regression method over the 2-level Monte Carlo method. PMID:25810269

  10. Remote sensing-aided systems for snow qualification, evapotranspiration estimation, and their application in hydrologic models

    NASA Technical Reports Server (NTRS)

    Korram, S.

    1977-01-01

    The design of general remote sensing-aided methodologies was studied to provide the estimates of several important inputs to water yield forecast models. These input parameters are snow area extent, snow water content, and evapotranspiration. The study area is Feather River Watershed (780,000 hectares), Northern California. The general approach involved a stepwise sequence of identification of the required information, sample design, measurement/estimation, and evaluation of results. All the relevent and available information types needed in the estimation process are being defined. These include Landsat, meteorological satellite, and aircraft imagery, topographic and geologic data, ground truth data, and climatic data from ground stations. A cost-effective multistage sampling approach was employed in quantification of all the required parameters. The physical and statistical models for both snow quantification and evapotranspiration estimation was developed. These models use the information obtained by aerial and ground data through appropriate statistical sampling design.

  11. Combining inferences from models of capture efficiency, detectability, and suitable habitat to classify landscapes for conservation of threatened bull trout

    USGS Publications Warehouse

    Peterson, J.; Dunham, J.B.

    2003-01-01

    Effective conservation efforts for at-risk species require knowledge of the locations of existing populations. Species presence can be estimated directly by conducting field-sampling surveys or alternatively by developing predictive models. Direct surveys can be expensive and inefficient, particularly for rare and difficult-to-sample species, and models of species presence may produce biased predictions. We present a Bayesian approach that combines sampling and model-based inferences for estimating species presence. The accuracy and cost-effectiveness of this approach were compared to those of sampling surveys and predictive models for estimating the presence of the threatened bull trout ( Salvelinus confluentus ) via simulation with existing models and empirical sampling data. Simulations indicated that a sampling-only approach would be the most effective and would result in the lowest presence and absence misclassification error rates for three thresholds of detection probability. When sampling effort was considered, however, the combined approach resulted in the lowest error rates per unit of sampling effort. Hence, lower probability-of-detection thresholds can be specified with the combined approach, resulting in lower misclassification error rates and improved cost-effectiveness.

  12. Estimating the Term Structure With a Semiparametric Bayesian Hierarchical Model: An Application to Corporate Bonds.

    PubMed

    Cruz-Marcelo, Alejandro; Ensor, Katherine B; Rosner, Gary L

    2011-06-01

    The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material.

  13. Estimating the Term Structure With a Semiparametric Bayesian Hierarchical Model: An Application to Corporate Bonds1

    PubMed Central

    Cruz-Marcelo, Alejandro; Ensor, Katherine B.; Rosner, Gary L.

    2011-01-01

    The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material. PMID:21765566

  14. Efficient estimation of abundance for patchily distributed populations via two-phase, adaptive sampling.

    USGS Publications Warehouse

    Conroy, M.J.; Runge, J.P.; Barker, R.J.; Schofield, M.R.; Fonnesbeck, C.J.

    2008-01-01

    Many organisms are patchily distributed, with some patches occupied at high density, others at lower densities, and others not occupied. Estimation of overall abundance can be difficult and is inefficient via intensive approaches such as capture-mark-recapture (CMR) or distance sampling. We propose a two-phase sampling scheme and model in a Bayesian framework to estimate abundance for patchily distributed populations. In the first phase, occupancy is estimated by binomial detection samples taken on all selected sites, where selection may be of all sites available, or a random sample of sites. Detection can be by visual surveys, detection of sign, physical captures, or other approach. At the second phase, if a detection threshold is achieved, CMR or other intensive sampling is conducted via standard procedures (grids or webs) to estimate abundance. Detection and CMR data are then used in a joint likelihood to model probability of detection in the occupancy sample via an abundance-detection model. CMR modeling is used to estimate abundance for the abundance-detection relationship, which in turn is used to predict abundance at the remaining sites, where only detection data are collected. We present a full Bayesian modeling treatment of this problem, in which posterior inference on abundance and other parameters (detection, capture probability) is obtained under a variety of assumptions about spatial and individual sources of heterogeneity. We apply the approach to abundance estimation for two species of voles (Microtus spp.) in Montana, USA. We also use a simulation study to evaluate the frequentist properties of our procedure given known patterns in abundance and detection among sites as well as design criteria. For most population characteristics and designs considered, bias and mean-square error (MSE) were low, and coverage of true parameter values by Bayesian credibility intervals was near nominal. Our two-phase, adaptive approach allows efficient estimation of abundance of rare and patchily distributed species and is particularly appropriate when sampling in all patches is impossible, but a global estimate of abundance is required.

  15. Inference for finite-sample trajectories in dynamic multi-state site-occupancy models using hidden Markov model smoothing

    USGS Publications Warehouse

    Fiske, Ian J.; Royle, J. Andrew; Gross, Kevin

    2014-01-01

    Ecologists and wildlife biologists increasingly use latent variable models to study patterns of species occurrence when detection is imperfect. These models have recently been generalized to accommodate both a more expansive description of state than simple presence or absence, and Markovian dynamics in the latent state over successive sampling seasons. In this paper, we write these multi-season, multi-state models as hidden Markov models to find both maximum likelihood estimates of model parameters and finite-sample estimators of the trajectory of the latent state over time. These estimators are especially useful for characterizing population trends in species of conservation concern. We also develop parametric bootstrap procedures that allow formal inference about latent trend. We examine model behavior through simulation, and we apply the model to data from the North American Amphibian Monitoring Program.

  16. Accounting for imperfect detection of groups and individuals when estimating abundance.

    PubMed

    Clement, Matthew J; Converse, Sarah J; Royle, J Andrew

    2017-09-01

    If animals are independently detected during surveys, many methods exist for estimating animal abundance despite detection probabilities <1. Common estimators include double-observer models, distance sampling models and combined double-observer and distance sampling models (known as mark-recapture-distance-sampling models; MRDS). When animals reside in groups, however, the assumption of independent detection is violated. In this case, the standard approach is to account for imperfect detection of groups, while assuming that individuals within groups are detected perfectly. However, this assumption is often unsupported. We introduce an abundance estimator for grouped animals when detection of groups is imperfect and group size may be under-counted, but not over-counted. The estimator combines an MRDS model with an N-mixture model to account for imperfect detection of individuals. The new MRDS-Nmix model requires the same data as an MRDS model (independent detection histories, an estimate of distance to transect, and an estimate of group size), plus a second estimate of group size provided by the second observer. We extend the model to situations in which detection of individuals within groups declines with distance. We simulated 12 data sets and used Bayesian methods to compare the performance of the new MRDS-Nmix model to an MRDS model. Abundance estimates generated by the MRDS-Nmix model exhibited minimal bias and nominal coverage levels. In contrast, MRDS abundance estimates were biased low and exhibited poor coverage. Many species of conservation interest reside in groups and could benefit from an estimator that better accounts for imperfect detection. Furthermore, the ability to relax the assumption of perfect detection of individuals within detected groups may allow surveyors to re-allocate resources toward detection of new groups instead of extensive surveys of known groups. We believe the proposed estimator is feasible because the only additional field data required are a second estimate of group size.

  17. Accounting for imperfect detection of groups and individuals when estimating abundance

    USGS Publications Warehouse

    Clement, Matthew J.; Converse, Sarah J.; Royle, J. Andrew

    2017-01-01

    If animals are independently detected during surveys, many methods exist for estimating animal abundance despite detection probabilities <1. Common estimators include double-observer models, distance sampling models and combined double-observer and distance sampling models (known as mark-recapture-distance-sampling models; MRDS). When animals reside in groups, however, the assumption of independent detection is violated. In this case, the standard approach is to account for imperfect detection of groups, while assuming that individuals within groups are detected perfectly. However, this assumption is often unsupported. We introduce an abundance estimator for grouped animals when detection of groups is imperfect and group size may be under-counted, but not over-counted. The estimator combines an MRDS model with an N-mixture model to account for imperfect detection of individuals. The new MRDS-Nmix model requires the same data as an MRDS model (independent detection histories, an estimate of distance to transect, and an estimate of group size), plus a second estimate of group size provided by the second observer. We extend the model to situations in which detection of individuals within groups declines with distance. We simulated 12 data sets and used Bayesian methods to compare the performance of the new MRDS-Nmix model to an MRDS model. Abundance estimates generated by the MRDS-Nmix model exhibited minimal bias and nominal coverage levels. In contrast, MRDS abundance estimates were biased low and exhibited poor coverage. Many species of conservation interest reside in groups and could benefit from an estimator that better accounts for imperfect detection. Furthermore, the ability to relax the assumption of perfect detection of individuals within detected groups may allow surveyors to re-allocate resources toward detection of new groups instead of extensive surveys of known groups. We believe the proposed estimator is feasible because the only additional field data required are a second estimate of group size.

  18. A review of single-sample-based models and other approaches for radiocarbon dating of dissolved inorganic carbon in groundwater

    USGS Publications Warehouse

    Han, L. F; Plummer, Niel

    2016-01-01

    Numerous methods have been proposed to estimate the pre-nuclear-detonation 14C content of dissolved inorganic carbon (DIC) recharged to groundwater that has been corrected/adjusted for geochemical processes in the absence of radioactive decay (14C0) - a quantity that is essential for estimation of radiocarbon age of DIC in groundwater. The models/approaches most commonly used are grouped as follows: (1) single-sample-based models, (2) a statistical approach based on the observed (curved) relationship between 14C and δ13C data for the aquifer, and (3) the geochemical mass-balance approach that constructs adjustment models accounting for all the geochemical reactions known to occur along a groundwater flow path. This review discusses first the geochemical processes behind each of the single-sample-based models, followed by discussions of the statistical approach and the geochemical mass-balance approach. Finally, the applications, advantages and limitations of the three groups of models/approaches are discussed.The single-sample-based models constitute the prevailing use of 14C data in hydrogeology and hydrological studies. This is in part because the models are applied to an individual water sample to estimate the 14C age, therefore the measurement data are easily available. These models have been shown to provide realistic radiocarbon ages in many studies. However, they usually are limited to simple carbonate aquifers and selection of model may have significant effects on 14C0 often resulting in a wide range of estimates of 14C ages.Of the single-sample-based models, four are recommended for the estimation of 14C0 of DIC in groundwater: Pearson's model, (Ingerson and Pearson, 1964; Pearson and White, 1967), Han & Plummer's model (Han and Plummer, 2013), the IAEA model (Gonfiantini, 1972; Salem et al., 1980), and Oeschger's model (Geyh, 2000). These four models include all processes considered in single-sample-based models, and can be used in different ranges of 13C values.In contrast to the single-sample-based models, the extended Gonfiantini & Zuppi model (Gonfiantini and Zuppi, 2003; Han et al., 2014) is a statistical approach. This approach can be used to estimate 14C ages when a curved relationship between the 14C and 13C values of the DIC data is observed. In addition to estimation of groundwater ages, the relationship between 14C and δ13C data can be used to interpret hydrogeological characteristics of the aquifer, e.g. estimating apparent rates of geochemical reactions and revealing the complexity of the geochemical environment, and identify samples that are not affected by the same set of reactions/processes as the rest of the dataset. The investigated water samples may have a wide range of ages, and for waters with very low values of 14C, the model based on statistics may give more reliable age estimates than those obtained from single-sample-based models. In the extended Gonfiantini & Zuppi model, a representative system-wide value of the initial 14C content is derived from the 14C and δ13C data of DIC and can differ from that used in single-sample-based models. Therefore, the extended Gonfiantini & Zuppi model usually avoids the effect of modern water components which might retain ‘bomb’ pulse signatures.The geochemical mass-balance approach constructs an adjustment model that accounts for all the geochemical reactions known to occur along an aquifer flow path (Plummer et al., 1983; Wigley et al., 1978; Plummer et al., 1994; Plummer and Glynn, 2013), and includes, in addition to DIC, dissolved organic carbon (DOC) and methane (CH4). If sufficient chemical, mineralogical and isotopic data are available, the geochemical mass-balance method can yield the most accurate estimates of the adjusted radiocarbon age. The main limitation of this approach is that complete information is necessary on chemical, mineralogical and isotopic data and these data are often limited.Failure to recognize the limitations and underlying assumptions on which the various models and approaches are based can result in a wide range of estimates of 14C0 and limit the usefulness of radiocarbon as a dating tool for groundwater. In each of the three generalized approaches (single-sample-based models, statistical approach, and geochemical mass-balance approach), successful application depends on scrutiny of the isotopic (14C and 13C) and chemical data to conceptualize the reactions and processes that affect the 14C content of DIC in aquifers. The recently developed graphical analysis method is shown to aid in determining which approach is most appropriate for the isotopic and chemical data from a groundwater system.

  19. Estimating site occupancy rates for aquatic plants using spatial sub-sampling designs when detection probabilities are less than one

    USGS Publications Warehouse

    Nielson, Ryan M.; Gray, Brian R.; McDonald, Lyman L.; Heglund, Patricia J.

    2011-01-01

    Estimation of site occupancy rates when detection probabilities are <1 is well established in wildlife science. Data from multiple visits to a sample of sites are used to estimate detection probabilities and the proportion of sites occupied by focal species. In this article we describe how site occupancy methods can be applied to estimate occupancy rates of plants and other sessile organisms. We illustrate this approach and the pitfalls of ignoring incomplete detection using spatial data for 2 aquatic vascular plants collected under the Upper Mississippi River's Long Term Resource Monitoring Program (LTRMP). Site occupancy models considered include: a naïve model that ignores incomplete detection, a simple site occupancy model assuming a constant occupancy rate and a constant probability of detection across sites, several models that allow site occupancy rates and probabilities of detection to vary with habitat characteristics, and mixture models that allow for unexplained variation in detection probabilities. We used information theoretic methods to rank competing models and bootstrapping to evaluate the goodness-of-fit of the final models. Results of our analysis confirm that ignoring incomplete detection can result in biased estimates of occupancy rates. Estimates of site occupancy rates for 2 aquatic plant species were 19–36% higher compared to naive estimates that ignored probabilities of detection <1. Simulations indicate that final models have little bias when 50 or more sites are sampled, and little gains in precision could be expected for sample sizes >300. We recommend applying site occupancy methods for monitoring presence of aquatic species.

  20. Area estimation using multiyear designs and partial crop identification

    NASA Technical Reports Server (NTRS)

    Sielken, R. L., Jr.

    1983-01-01

    Progress is reported for the following areas: (1) estimating the stratum's crop acreage proportion using the multiyear area estimation model; (2) assessment of multiyear sampling designs; and (3) development of statistical methodology for incorporating partially identified sample segments into crop area estimation.

  1. The Impact of Sample Size and Other Factors When Estimating Multilevel Logistic Models

    ERIC Educational Resources Information Center

    Schoeneberger, Jason A.

    2016-01-01

    The design of research studies utilizing binary multilevel models must necessarily incorporate knowledge of multiple factors, including estimation method, variance component size, or number of predictors, in addition to sample sizes. This Monte Carlo study examined the performance of random effect binary outcome multilevel models under varying…

  2. Occupancy Modeling for Improved Accuracy and Understanding of Pathogen Prevalence and Dynamics

    PubMed Central

    Colvin, Michael E.; Peterson, James T.; Kent, Michael L.; Schreck, Carl B.

    2015-01-01

    Most pathogen detection tests are imperfect, with a sensitivity < 100%, thereby resulting in the potential for a false negative, where a pathogen is present but not detected. False negatives in a sample inflate the number of non-detections, negatively biasing estimates of pathogen prevalence. Histological examination of tissues as a diagnostic test can be advantageous as multiple pathogens can be examined and providing important information on associated pathological changes to the host. However, it is usually less sensitive than molecular or microbiological tests for specific pathogens. Our study objectives were to 1) develop a hierarchical occupancy model to examine pathogen prevalence in spring Chinook salmon Oncorhynchus tshawytscha and their distribution among host tissues 2) use the model to estimate pathogen-specific test sensitivities and infection rates, and 3) illustrate the effect of using replicate within host sampling on sample sizes required to detect a pathogen. We examined histological sections of replicate tissue samples from spring Chinook salmon O. tshawytscha collected after spawning for common pathogens seen in this population: Apophallus/echinostome metacercariae, Parvicapsula minibicornis, Nanophyetus salmincola/ metacercariae, and Renibacterium salmoninarum. A hierarchical occupancy model was developed to estimate pathogen and tissue-specific test sensitivities and unbiased estimation of host- and organ-level infection rates. Model estimated sensitivities and host- and organ-level infections rates varied among pathogens and model estimated infection rate was higher than prevalence unadjusted for test sensitivity, confirming that prevalence unadjusted for test sensitivity was negatively biased. The modeling approach provided an analytical approach for using hierarchically structured pathogen detection data from lower sensitivity diagnostic tests, such as histology, to obtain unbiased pathogen prevalence estimates with associated uncertainties. Accounting for test sensitivity using within host replicate samples also required fewer individual fish to be sampled. This approach is useful for evaluating pathogen or microbe community dynamics when test sensitivity is <100%. PMID:25738709

  3. Occupancy modeling for improved accuracy and understanding of pathogen prevalence and dynamics

    USGS Publications Warehouse

    Colvin, Michael E.; Peterson, James T.; Kent, Michael L.; Schreck, Carl B.

    2015-01-01

    Most pathogen detection tests are imperfect, with a sensitivity < 100%, thereby resulting in the potential for a false negative, where a pathogen is present but not detected. False negatives in a sample inflate the number of non-detections, negatively biasing estimates of pathogen prevalence. Histological examination of tissues as a diagnostic test can be advantageous as multiple pathogens can be examined and providing important information on associated pathological changes to the host. However, it is usually less sensitive than molecular or microbiological tests for specific pathogens. Our study objectives were to 1) develop a hierarchical occupancy model to examine pathogen prevalence in spring Chinook salmonOncorhynchus tshawytscha and their distribution among host tissues 2) use the model to estimate pathogen-specific test sensitivities and infection rates, and 3) illustrate the effect of using replicate within host sampling on sample sizes required to detect a pathogen. We examined histological sections of replicate tissue samples from spring Chinook salmon O. tshawytscha collected after spawning for common pathogens seen in this population:Apophallus/echinostome metacercariae, Parvicapsula minibicornis, Nanophyetus salmincola/metacercariae, and Renibacterium salmoninarum. A hierarchical occupancy model was developed to estimate pathogen and tissue-specific test sensitivities and unbiased estimation of host- and organ-level infection rates. Model estimated sensitivities and host- and organ-level infections rates varied among pathogens and model estimated infection rate was higher than prevalence unadjusted for test sensitivity, confirming that prevalence unadjusted for test sensitivity was negatively biased. The modeling approach provided an analytical approach for using hierarchically structured pathogen detection data from lower sensitivity diagnostic tests, such as histology, to obtain unbiased pathogen prevalence estimates with associated uncertainties. Accounting for test sensitivity using within host replicate samples also required fewer individual fish to be sampled. This approach is useful for evaluating pathogen or microbe community dynamics when test sensitivity is <100%.

  4. Methods for estimating population density in data-limited areas: evaluating regression and tree-based models in Peru.

    PubMed

    Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

    2014-01-01

    Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.

  5. Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru

    PubMed Central

    Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

    2014-01-01

    Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657

  6. An estimator of the survival function based on the semi-Markov model under dependent censorship.

    PubMed

    Lee, Seung-Yeoun; Tsai, Wei-Yann

    2005-06-01

    Lee and Wolfe (Biometrics vol. 54 pp. 1176-1178, 1998) proposed the two-stage sampling design for testing the assumption of independent censoring, which involves further follow-up of a subset of lost-to-follow-up censored subjects. They also proposed an adjusted estimator for the survivor function for a proportional hazards model under the dependent censoring model. In this paper, a new estimator for the survivor function is proposed for the semi-Markov model under the dependent censorship on the basis of the two-stage sampling data. The consistency and the asymptotic distribution of the proposed estimator are derived. The estimation procedure is illustrated with an example of lung cancer clinical trial and simulation results are reported of the mean squared errors of estimators under a proportional hazards and two different nonproportional hazards models.

  7. Estimating the circuit delay of FPGA with a transfer learning method

    NASA Astrophysics Data System (ADS)

    Cui, Xiuhai; Liu, Datong; Peng, Yu; Peng, Xiyuan

    2017-10-01

    With the increase of FPGA (Field Programmable Gate Array, FPGA) functionality, FPGA has become an on-chip system platform. Due to increase the complexity of FPGA, estimating the delay of FPGA is a very challenge work. To solve the problems, we propose a transfer learning estimation delay (TLED) method to simplify the delay estimation of different speed grade FPGA. In fact, the same style different speed grade FPGA comes from the same process and layout. The delay has some correlation among different speed grade FPGA. Therefore, one kind of speed grade FPGA is chosen as a basic training sample in this paper. Other training samples of different speed grade can get from the basic training samples through of transfer learning. At the same time, we also select a few target FPGA samples as training samples. A general predictive model is trained by these samples. Thus one kind of estimation model is used to estimate different speed grade FPGA circuit delay. The framework of TRED includes three phases: 1) Building a basic circuit delay library which includes multipliers, adders, shifters, and so on. These circuits are used to train and build the predictive model. 2) By contrasting experiments among different algorithms, the forest random algorithm is selected to train predictive model. 3) The target circuit delay is predicted by the predictive model. The Artix-7, Kintex-7, and Virtex-7 are selected to do experiments. Each of them includes -1, -2, -2l, and -3 different speed grade. The experiments show the delay estimation accuracy score is more than 92% with the TLED method. This result shows that the TLED method is a feasible delay assessment method, especially in the high-level synthesis stage of FPGA tool, which is an efficient and effective delay assessment method.

  8. Small Body GN and C Research Report: G-SAMPLE - An In-Flight Dynamical Method for Identifying Sample Mass [External Release Version

    NASA Technical Reports Server (NTRS)

    Carson, John M., III; Bayard, David S.

    2006-01-01

    G-SAMPLE is an in-flight dynamical method for use by sample collection missions to identify the presence and quantity of collected sample material. The G-SAMPLE method implements a maximum-likelihood estimator to identify the collected sample mass, based on onboard force sensor measurements, thruster firings, and a dynamics model of the spacecraft. With G-SAMPLE, sample mass identification becomes a computation rather than an extra hardware requirement; the added cost of cameras or other sensors for sample mass detection is avoided. Realistic simulation examples are provided for a spacecraft configuration with a sample collection device mounted on the end of an extended boom. In one representative example, a 1000 gram sample mass is estimated to within 110 grams (95% confidence) under realistic assumptions of thruster profile error, spacecraft parameter uncertainty, and sensor noise. For convenience to future mission design, an overall sample-mass estimation error budget is developed to approximate the effect of model uncertainty, sensor noise, data rate, and thrust profile error on the expected estimate of collected sample mass.

  9. Finite mixture model: A maximum likelihood estimation approach on time series data

    NASA Astrophysics Data System (ADS)

    Yen, Phoong Seuk; Ismail, Mohd Tahir; Hamzah, Firdaus Mohamad

    2014-09-01

    Recently, statistician emphasized on the fitting of finite mixture model by using maximum likelihood estimation as it provides asymptotic properties. In addition, it shows consistency properties as the sample sizes increases to infinity. This illustrated that maximum likelihood estimation is an unbiased estimator. Moreover, the estimate parameters obtained from the application of maximum likelihood estimation have smallest variance as compared to others statistical method as the sample sizes increases. Thus, maximum likelihood estimation is adopted in this paper to fit the two-component mixture model in order to explore the relationship between rubber price and exchange rate for Malaysia, Thailand, Philippines and Indonesia. Results described that there is a negative effect among rubber price and exchange rate for all selected countries.

  10. Evaluating multi-level models to test occupancy state responses of Plethodontid salamanders

    USGS Publications Warehouse

    Kroll, Andrew J.; Garcia, Tiffany S.; Jones, Jay E.; Dugger, Catherine; Murden, Blake; Johnson, Josh; Peerman, Summer; Brintz, Ben; Rochelle, Michael

    2015-01-01

    Plethodontid salamanders are diverse and widely distributed taxa and play critical roles in ecosystem processes. Due to salamander use of structurally complex habitats, and because only a portion of a population is available for sampling, evaluation of sampling designs and estimators is critical to provide strong inference about Plethodontid ecology and responses to conservation and management activities. We conducted a simulation study to evaluate the effectiveness of multi-scale and hierarchical single-scale occupancy models in the context of a Before-After Control-Impact (BACI) experimental design with multiple levels of sampling. Also, we fit the hierarchical single-scale model to empirical data collected for Oregon slender and Ensatina salamanders across two years on 66 forest stands in the Cascade Range, Oregon, USA. All models were fit within a Bayesian framework. Estimator precision in both models improved with increasing numbers of primary and secondary sampling units, underscoring the potential gains accrued when adding secondary sampling units. Both models showed evidence of estimator bias at low detection probabilities and low sample sizes; this problem was particularly acute for the multi-scale model. Our results suggested that sufficient sample sizes at both the primary and secondary sampling levels could ameliorate this issue. Empirical data indicated Oregon slender salamander occupancy was associated strongly with the amount of coarse woody debris (posterior mean = 0.74; SD = 0.24); Ensatina occupancy was not associated with amount of coarse woody debris (posterior mean = -0.01; SD = 0.29). Our simulation results indicate that either model is suitable for use in an experimental study of Plethodontid salamanders provided that sample sizes are sufficiently large. However, hierarchical single-scale and multi-scale models describe different processes and estimate different parameters. As a result, we recommend careful consideration of study questions and objectives prior to sampling data and fitting models.

  11. Implementing Generalized Additive Models to Estimate the Expected Value of Sample Information in a Microsimulation Model: Results of Three Case Studies.

    PubMed

    Rabideau, Dustin J; Pei, Pamela P; Walensky, Rochelle P; Zheng, Amy; Parker, Robert A

    2018-02-01

    The expected value of sample information (EVSI) can help prioritize research but its application is hampered by computational infeasibility, especially for complex models. We investigated an approach by Strong and colleagues to estimate EVSI by applying generalized additive models (GAM) to results generated from a probabilistic sensitivity analysis (PSA). For 3 potential HIV prevention and treatment strategies, we estimated life expectancy and lifetime costs using the Cost-effectiveness of Preventing AIDS Complications (CEPAC) model, a complex patient-level microsimulation model of HIV progression. We fitted a GAM-a flexible regression model that estimates the functional form as part of the model fitting process-to the incremental net monetary benefits obtained from the CEPAC PSA. For each case study, we calculated the expected value of partial perfect information (EVPPI) using both the conventional nested Monte Carlo approach and the GAM approach. EVSI was calculated using the GAM approach. For all 3 case studies, the GAM approach consistently gave similar estimates of EVPPI compared with the conventional approach. The EVSI behaved as expected: it increased and converged to EVPPI for larger sample sizes. For each case study, generating the PSA results for the GAM approach required 3 to 4 days on a shared cluster, after which EVPPI and EVSI across a range of sample sizes were evaluated in minutes. The conventional approach required approximately 5 weeks for the EVPPI calculation alone. Estimating EVSI using the GAM approach with results from a PSA dramatically reduced the time required to conduct a computationally intense project, which would otherwise have been impractical. Using the GAM approach, we can efficiently provide policy makers with EVSI estimates, even for complex patient-level microsimulation models.

  12. Accounting for nonsampling error in estimates of HIV epidemic trends from antenatal clinic sentinel surveillance

    PubMed Central

    Eaton, Jeffrey W.; Bao, Le

    2017-01-01

    Objectives The aim of the study was to propose and demonstrate an approach to allow additional nonsampling uncertainty about HIV prevalence measured at antenatal clinic sentinel surveillance (ANC-SS) in model-based inferences about trends in HIV incidence and prevalence. Design Mathematical model fitted to surveillance data with Bayesian inference. Methods We introduce a variance inflation parameter σinfl2 that accounts for the uncertainty of nonsampling errors in ANC-SS prevalence. It is additive to the sampling error variance. Three approaches are tested for estimating σinfl2 using ANC-SS and household survey data from 40 subnational regions in nine countries in sub-Saharan, as defined in UNAIDS 2016 estimates. Methods were compared using in-sample fit and out-of-sample prediction of ANC-SS data, fit to household survey prevalence data, and the computational implications. Results Introducing the additional variance parameter σinfl2 increased the error variance around ANC-SS prevalence observations by a median of 2.7 times (interquartile range 1.9–3.8). Using only sampling error in ANC-SS prevalence ( σinfl2=0), coverage of 95% prediction intervals was 69% in out-of-sample prediction tests. This increased to 90% after introducing the additional variance parameter σinfl2. The revised probabilistic model improved model fit to household survey prevalence and increased epidemic uncertainty intervals most during the early epidemic period before 2005. Estimating σinfl2 did not increase the computational cost of model fitting. Conclusions: We recommend estimating nonsampling error in ANC-SS as an additional parameter in Bayesian inference using the Estimation and Projection Package model. This approach may prove useful for incorporating other data sources such as routine prevalence from Prevention of mother-to-child transmission testing into future epidemic estimates. PMID:28296801

  13. Accurate Biomass Estimation via Bayesian Adaptive Sampling

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin R.; Knuth, Kevin H.; Castle, Joseph P.; Lvov, Nikolay

    2005-01-01

    The following concepts were introduced: a) Bayesian adaptive sampling for solving biomass estimation; b) Characterization of MISR Rahman model parameters conditioned upon MODIS landcover. c) Rigorous non-parametric Bayesian approach to analytic mixture model determination. d) Unique U.S. asset for science product validation and verification.

  14. [Potentials in the regionalization of health indicators using small-area estimation methods : Exemplary results based on the 2009, 2010 and 2012 GEDA studies].

    PubMed

    Kroll, Lars Eric; Schumann, Maria; Müters, Stephan; Lampert, Thomas

    2017-12-01

    Nationwide health surveys can be used to estimate regional differences in health. Using traditional estimation techniques, the spatial depth for these estimates is limited due to the constrained sample size. So far - without special refreshment samples - results have only been available for larger populated federal states of Germany. An alternative is regression-based small-area estimation techniques. These models can generate smaller-scale data, but are also subject to greater statistical uncertainties because of the model assumptions. In the present article, exemplary regionalized results based on the studies "Gesundheit in Deutschland aktuell" (GEDA studies) 2009, 2010 and 2012, are compared to the self-rated health status of the respondents. The aim of the article is to analyze the range of regional estimates in order to assess the usefulness of the techniques for health reporting more adequately. The results show that the estimated prevalence is relatively stable when using different samples. Important determinants of the variation of the estimates are the achieved sample size on the district level and the type of the district (cities vs. rural regions). Overall, the present study shows that small-area modeling of prevalence is associated with additional uncertainties compared to conventional estimates, which should be taken into account when interpreting the corresponding findings.

  15. Estimating abundance

    USGS Publications Warehouse

    Sutherland, Chris; Royle, Andy

    2016-01-01

    This chapter provides a non-technical overview of ‘closed population capture–recapture’ models, a class of well-established models that are widely applied in ecology, such as removal sampling, covariate models, and distance sampling. These methods are regularly adopted for studies of reptiles, in order to estimate abundance from counts of marked individuals while accounting for imperfect detection. Thus, the chapter describes some classic closed population models for estimating abundance, with considerations for some recent extensions that provide a spatial context for the estimation of abundance, and therefore density. Finally, the chapter suggests some software for use in data analysis, such as the Windows-based program MARK, and provides an example of estimating abundance and density of reptiles using an artificial cover object survey of Slow Worms (Anguis fragilis).

  16. Estimating abundance: Chapter 27

    USGS Publications Warehouse

    Royle, J. Andrew

    2016-01-01

    This chapter provides a non-technical overview of ‘closed population capture–recapture’ models, a class of well-established models that are widely applied in ecology, such as removal sampling, covariate models, and distance sampling. These methods are regularly adopted for studies of reptiles, in order to estimate abundance from counts of marked individuals while accounting for imperfect detection. Thus, the chapter describes some classic closed population models for estimating abundance, with considerations for some recent extensions that provide a spatial context for the estimation of abundance, and therefore density. Finally, the chapter suggests some software for use in data analysis, such as the Windows-based program MARK, and provides an example of estimating abundance and density of reptiles using an artificial cover object survey of Slow Worms (Anguis fragilis).

  17. Accounting for sampling error when inferring population synchrony from time-series data: a Bayesian state-space modelling approach with applications.

    PubMed

    Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique

    2014-01-01

    Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates.

  18. Accounting for Sampling Error When Inferring Population Synchrony from Time-Series Data: A Bayesian State-Space Modelling Approach with Applications

    PubMed Central

    Santin-Janin, Hugues; Hugueny, Bernard; Aubry, Philippe; Fouchet, David; Gimenez, Olivier; Pontier, Dominique

    2014-01-01

    Background Data collected to inform time variations in natural population size are tainted by sampling error. Ignoring sampling error in population dynamics models induces bias in parameter estimators, e.g., density-dependence. In particular, when sampling errors are independent among populations, the classical estimator of the synchrony strength (zero-lag correlation) is biased downward. However, this bias is rarely taken into account in synchrony studies although it may lead to overemphasizing the role of intrinsic factors (e.g., dispersal) with respect to extrinsic factors (the Moran effect) in generating population synchrony as well as to underestimating the extinction risk of a metapopulation. Methodology/Principal findings The aim of this paper was first to illustrate the extent of the bias that can be encountered in empirical studies when sampling error is neglected. Second, we presented a space-state modelling approach that explicitly accounts for sampling error when quantifying population synchrony. Third, we exemplify our approach with datasets for which sampling variance (i) has been previously estimated, and (ii) has to be jointly estimated with population synchrony. Finally, we compared our results to those of a standard approach neglecting sampling variance. We showed that ignoring sampling variance can mask a synchrony pattern whatever its true value and that the common practice of averaging few replicates of population size estimates poorly performed at decreasing the bias of the classical estimator of the synchrony strength. Conclusion/Significance The state-space model used in this study provides a flexible way of accurately quantifying the strength of synchrony patterns from most population size data encountered in field studies, including over-dispersed count data. We provided a user-friendly R-program and a tutorial example to encourage further studies aiming at quantifying the strength of population synchrony to account for uncertainty in population size estimates. PMID:24489839

  19. Dynamic Method for Identifying Collected Sample Mass

    NASA Technical Reports Server (NTRS)

    Carson, John

    2008-01-01

    G-Sample is designed for sample collection missions to identify the presence and quantity of sample material gathered by spacecraft equipped with end effectors. The software method uses a maximum-likelihood estimator to identify the collected sample's mass based on onboard force-sensor measurements, thruster firings, and a dynamics model of the spacecraft. This makes sample mass identification a computation rather than a process requiring additional hardware. Simulation examples of G-Sample are provided for spacecraft model configurations with a sample collection device mounted on the end of an extended boom. In the absence of thrust knowledge errors, the results indicate that G-Sample can identify the amount of collected sample mass to within 10 grams (with 95-percent confidence) by using a force sensor with a noise and quantization floor of 50 micrometers. These results hold even in the presence of realistic parametric uncertainty in actual spacecraft inertia, center-of-mass offset, and first flexibility modes. Thrust profile knowledge is shown to be a dominant sensitivity for G-Sample, entering in a nearly one-to-one relationship with the final mass estimation error. This means thrust profiles should be well characterized with onboard accelerometers prior to sample collection. An overall sample-mass estimation error budget has been developed to approximate the effect of model uncertainty, sensor noise, data rate, and thrust profile error on the expected estimate of collected sample mass.

  20. Estimating linear temporal trends from aggregated environmental monitoring data

    USGS Publications Warehouse

    Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.

    2017-01-01

    Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.

  1. Maximum likelihood estimation of finite mixture model for economic data

    NASA Astrophysics Data System (ADS)

    Phoong, Seuk-Yen; Ismail, Mohd Tahir

    2014-06-01

    Finite mixture model is a mixture model with finite-dimension. This models are provides a natural representation of heterogeneity in a finite number of latent classes. In addition, finite mixture models also known as latent class models or unsupervised learning models. Recently, maximum likelihood estimation fitted finite mixture models has greatly drawn statistician's attention. The main reason is because maximum likelihood estimation is a powerful statistical method which provides consistent findings as the sample sizes increases to infinity. Thus, the application of maximum likelihood estimation is used to fit finite mixture model in the present paper in order to explore the relationship between nonlinear economic data. In this paper, a two-component normal mixture model is fitted by maximum likelihood estimation in order to investigate the relationship among stock market price and rubber price for sampled countries. Results described that there is a negative effect among rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia.

  2. Estimation of river and stream temperature trends under haphazard sampling

    USGS Publications Warehouse

    Gray, Brian R.; Lyubchich, Vyacheslav; Gel, Yulia R.; Rogala, James T.; Robertson, Dale M.; Wei, Xiaoqiao

    2015-01-01

    Long-term temporal trends in water temperature in rivers and streams are typically estimated under the assumption of evenly-spaced space-time measurements. However, sampling times and dates associated with historical water temperature datasets and some sampling designs may be haphazard. As a result, trends in temperature may be confounded with trends in time or space of sampling which, in turn, may yield biased trend estimators and thus unreliable conclusions. We address this concern using multilevel (hierarchical) linear models, where time effects are allowed to vary randomly by day and date effects by year. We evaluate the proposed approach by Monte Carlo simulations with imbalance, sparse data and confounding by trend in time and date of sampling. Simulation results indicate unbiased trend estimators while results from a case study of temperature data from the Illinois River, USA conform to river thermal assumptions. We also propose a new nonparametric bootstrap inference on multilevel models that allows for a relatively flexible and distribution-free quantification of uncertainties. The proposed multilevel modeling approach may be elaborated to accommodate nonlinearities within days and years when sampling times or dates typically span temperature extremes.

  3. A simple linear model for estimating ozone AOT40 at forest sites from raw passive sampling data.

    PubMed

    Ferretti, Marco; Cristofolini, Fabiana; Cristofori, Antonella; Gerosa, Giacomo; Gottardini, Elena

    2012-08-01

    A rapid, empirical method is described for estimating weekly AOT40 from ozone concentrations measured with passive samplers at forest sites. The method is based on linear regression and was developed after three years of measurements in Trentino (northern Italy). It was tested against an independent set of data from passive sampler sites across Italy. It provides good weekly estimates compared with those measured by conventional monitors (0.85 ≤R(2)≤ 0.970; 97 ≤ RMSE ≤ 302). Estimates obtained using passive sampling at forest sites are comparable to those obtained by another estimation method based on modelling hourly concentrations (R(2) = 0.94; 131 ≤ RMSE ≤ 351). Regression coefficients of passive sampling are similar to those obtained with conventional monitors at forest sites. Testing against an independent dataset generated by passive sampling provided similar results (0.86 ≤R(2)≤ 0.99; 65 ≤ RMSE ≤ 478). Errors tend to accumulate when weekly AOT40 estimates are summed to obtain the total AOT40 over the May-July period, and the median deviation between the two estimation methods based on passive sampling is 11%. The method proposed does not require any assumptions, complex calculation or modelling technique, and can be useful when other estimation methods are not feasible, either in principle or in practice. However, the method is not useful when estimates of hourly concentrations are of interest.

  4. A Note on Structural Equation Modeling Estimates of Reliability

    ERIC Educational Resources Information Center

    Yang, Yanyun; Green, Samuel B.

    2010-01-01

    Reliability can be estimated using structural equation modeling (SEM). Two potential problems with this approach are that estimates may be unstable with small sample sizes and biased with misspecified models. A Monte Carlo study was conducted to investigate the quality of SEM estimates of reliability by themselves and relative to coefficient…

  5. Estimating abundance of mountain lions from unstructured spatial sampling

    USGS Publications Warehouse

    Russell, Robin E.; Royle, J. Andrew; Desimone, Richard; Schwartz, Michael K.; Edwards, Victoria L.; Pilgrim, Kristy P.; Mckelvey, Kevin S.

    2012-01-01

    Mountain lions (Puma concolor) are often difficult to monitor because of their low capture probabilities, extensive movements, and large territories. Methods for estimating the abundance of this species are needed to assess population status, determine harvest levels, evaluate the impacts of management actions on populations, and derive conservation and management strategies. Traditional mark–recapture methods do not explicitly account for differences in individual capture probabilities due to the spatial distribution of individuals in relation to survey effort (or trap locations). However, recent advances in the analysis of capture–recapture data have produced methods estimating abundance and density of animals from spatially explicit capture–recapture data that account for heterogeneity in capture probabilities due to the spatial organization of individuals and traps. We adapt recently developed spatial capture–recapture models to estimate density and abundance of mountain lions in western Montana. Volunteers and state agency personnel collected mountain lion DNA samples in portions of the Blackfoot drainage (7,908 km2) in west-central Montana using 2 methods: snow back-tracking mountain lion tracks to collect hair samples and biopsy darting treed mountain lions to obtain tissue samples. Overall, we recorded 72 individual capture events, including captures both with and without tissue sample collection and hair samples resulting in the identification of 50 individual mountain lions (30 females, 19 males, and 1 unknown sex individual). We estimated lion densities from 8 models containing effects of distance, sex, and survey effort on detection probability. Our population density estimates ranged from a minimum of 3.7 mountain lions/100 km2 (95% Cl 2.3–5.7) under the distance only model (including only an effect of distance on detection probability) to 6.7 (95% Cl 3.1–11.0) under the full model (including effects of distance, sex, survey effort, and distance x sex on detection probability). These numbers translate to a total estimate of 293 mountain lions (95% Cl 182–451) to 529 (95% Cl 245–870) within the Blackfoot drainage. Results from the distance model are similar to previous estimates of 3.6 mountain lions/100 km2 for the study area; however, results from all other models indicated greater numbers of mountain lions. Our results indicate that unstructured spatial sampling combined with spatial capture–recapture analysis can be an effective method for estimating large carnivore densities.

  6. Small-mammal density estimation: A field comparison of grid-based vs. web-based density estimators

    USGS Publications Warehouse

    Parmenter, R.R.; Yates, Terry L.; Anderson, D.R.; Burnham, K.P.; Dunnum, J.L.; Franklin, A.B.; Friggens, M.T.; Lubow, B.C.; Miller, M.; Olson, G.S.; Parmenter, Cheryl A.; Pollard, J.; Rexstad, E.; Shenk, T.M.; Stanley, T.R.; White, Gary C.

    2003-01-01

    Statistical models for estimating absolute densities of field populations of animals have been widely used over the last century in both scientific studies and wildlife management programs. To date, two general classes of density estimation models have been developed: models that use data sets from capture–recapture or removal sampling techniques (often derived from trapping grids) from which separate estimates of population size (NÌ‚) and effective sampling area (AÌ‚) are used to calculate density (DÌ‚ = NÌ‚/AÌ‚); and models applicable to sampling regimes using distance-sampling theory (typically transect lines or trapping webs) to estimate detection functions and densities directly from the distance data. However, few studies have evaluated these respective models for accuracy, precision, and bias on known field populations, and no studies have been conducted that compare the two approaches under controlled field conditions. In this study, we evaluated both classes of density estimators on known densities of enclosed rodent populations. Test data sets (n = 11) were developed using nine rodent species from capture–recapture live-trapping on both trapping grids and trapping webs in four replicate 4.2-ha enclosures on the Sevilleta National Wildlife Refuge in central New Mexico, USA. Additional “saturation” trapping efforts resulted in an enumeration of the rodent populations in each enclosure, allowing the computation of true densities. Density estimates (DÌ‚) were calculated using program CAPTURE for the grid data sets and program DISTANCE for the web data sets, and these results were compared to the known true densities (D) to evaluate each model's relative mean square error, accuracy, precision, and bias. In addition, we evaluated a variety of approaches to each data set's analysis by having a group of independent expert analysts calculate their best density estimates without a priori knowledge of the true densities; this “blind” test allowed us to evaluate the influence of expertise and experience in calculating density estimates in comparison to simply using default values in programs CAPTURE and DISTANCE. While the rodent sample sizes were considerably smaller than the recommended minimum for good model results, we found that several models performed well empirically, including the web-based uniform and half-normal models in program DISTANCE, and the grid-based models Mb and Mbh in program CAPTURE (with AÌ‚ adjusted by species-specific full mean maximum distance moved (MMDM) values). These models produced accurate DÌ‚ values (with 95% confidence intervals that included the true D values) and exhibited acceptable bias but poor precision. However, in linear regression analyses comparing each model's DÌ‚ values to the true D values over the range of observed test densities, only the web-based uniform model exhibited a regression slope near 1.0; all other models showed substantial slope deviations, indicating biased estimates at higher or lower density values. In addition, the grid-based DÌ‚ analyses using full MMDM values for WÌ‚ area adjustments required a number of theoretical assumptions of uncertain validity, and we therefore viewed their empirical successes with caution. Finally, density estimates from the independent analysts were highly variable, but estimates from web-based approaches had smaller mean square errors and better achieved confidence-interval coverage of D than did grid-based approaches. Our results support the contention that web-based approaches for density estimation of small-mammal populations are both theoretically and empirically superior to grid-based approaches, even when sample size is far less than often recommended. In view of the increasing need for standardized environmental measures for comparisons among ecosystems and through time, analytical models based on distance sampling appear to offer accurate density estimation approaches for research studies involving small-mammal abundances.

  7. Using open robust design models to estimate temporary emigration from capture-recapture data.

    PubMed

    Kendall, W L; Bjorkland, R

    2001-12-01

    Capture-recapture studies are crucial in many circumstances for estimating demographic parameters for wildlife and fish populations. Pollock's robust design, involving multiple sampling occasions per period of interest, provides several advantages over classical approaches. This includes the ability to estimate the probability of being present and available for detection, which in some situations is equivalent to breeding probability. We present a model for estimating availability for detection that relaxes two assumptions required in previous approaches. The first is that the sampled population is closed to additions and deletions across samples within a period of interest. The second is that each member of the population has the same probability of being available for detection in a given period. We apply our model to estimate survival and breeding probability in a study of hawksbill sea turtles (Eretmochelys imbricata), where previous approaches are not appropriate.

  8. Using open robust design models to estimate temporary emigration from capture-recapture data

    USGS Publications Warehouse

    Kendall, W.L.; Bjorkland, R.

    2001-01-01

    Capture-recapture studies are crucial in many circumstances for estimating demographic parameters for wildlife and fish populations. Pollock's robust design, involving multiple sampling occasions per period of interest, provides several advantages over classical approaches. This includes the ability to estimate the probability of being present and available for detection, which in some situations is equivalent to breeding probability. We present a model for estimating availability for detection that relaxes two assumptions required in previous approaches. The first is that the sampled population is closed to additions and deletions across samples within a period of interest. The second is that each member of the population has the same probability of being available for detection in a given period. We apply our model to estimate survival and breeding probability in a study of hawksbill sea turtles (Eretmochelys imbricata), where previous approaches are not appropriate.

  9. Population size and stopover duration estimation using mark–resight data and Bayesian analysis of a superpopulation model

    USGS Publications Warehouse

    Lyons, James E.; Kendall, William L.; Royle, J. Andrew; Converse, Sarah J.; Andres, Brad A.; Buchanan, Joseph B.

    2016-01-01

    We present a novel formulation of a mark–recapture–resight model that allows estimation of population size, stopover duration, and arrival and departure schedules at migration areas. Estimation is based on encounter histories of uniquely marked individuals and relative counts of marked and unmarked animals. We use a Bayesian analysis of a state–space formulation of the Jolly–Seber mark–recapture model, integrated with a binomial model for counts of unmarked animals, to derive estimates of population size and arrival and departure probabilities. We also provide a novel estimator for stopover duration that is derived from the latent state variable representing the interim between arrival and departure in the state–space model. We conduct a simulation study of field sampling protocols to understand the impact of superpopulation size, proportion marked, and number of animals sampled on bias and precision of estimates. Simulation results indicate that relative bias of estimates of the proportion of the population with marks was low for all sampling scenarios and never exceeded 2%. Our approach does not require enumeration of all unmarked animals detected or direct knowledge of the number of marked animals in the population at the time of the study. This provides flexibility and potential application in a variety of sampling situations (e.g., migratory birds, breeding seabirds, sea turtles, fish, pinnipeds, etc.). Application of the methods is demonstrated with data from a study of migratory sandpipers.

  10. Comparison of Precision of Biomass Estimates in Regional Field Sample Surveys and Airborne LiDAR-Assisted Surveys in Hedmark County, Norway

    NASA Technical Reports Server (NTRS)

    Naesset, Erik; Gobakken, Terje; Bollandsas, Ole Martin; Gregoire, Timothy G.; Nelson, Ross; Stahl, Goeran

    2013-01-01

    Airborne scanning LiDAR (Light Detection and Ranging) has emerged as a promising tool to provide auxiliary data for sample surveys aiming at estimation of above-ground tree biomass (AGB), with potential applications in REDD forest monitoring. For larger geographical regions such as counties, states or nations, it is not feasible to collect airborne LiDAR data continuously ("wall-to-wall") over the entire area of interest. Two-stage cluster survey designs have therefore been demonstrated by which LiDAR data are collected along selected individual flight-lines treated as clusters and with ground plots sampled along these LiDAR swaths. Recently, analytical AGB estimators and associated variance estimators that quantify the sampling variability have been proposed. Empirical studies employing these estimators have shown a seemingly equal or even larger uncertainty of the AGB estimates obtained with extensive use of LiDAR data to support the estimation as compared to pure field-based estimates employing estimators appropriate under simple random sampling (SRS). However, comparison of uncertainty estimates under SRS and sophisticated two-stage designs is complicated by large differences in the designs and assumptions. In this study, probability-based principles to estimation and inference were followed. We assumed designs of a field sample and a LiDAR-assisted survey of Hedmark County (HC) (27,390 km2), Norway, considered to be more comparable than those assumed in previous studies. The field sample consisted of 659 systematically distributed National Forest Inventory (NFI) plots and the airborne scanning LiDAR data were collected along 53 parallel flight-lines flown over the NFI plots. We compared AGB estimates based on the field survey only assuming SRS against corresponding estimates assuming two-phase (double) sampling with LiDAR and employing model-assisted estimators. We also compared AGB estimates based on the field survey only assuming two-stage sampling (the NFI plots being grouped in clusters) against corresponding estimates assuming two-stage sampling with the LiDAR and employing model-assisted estimators. For each of the two comparisons, the standard errors of the AGB estimates were consistently lower for the LiDAR-assisted designs. The overall reduction of the standard errors in the LiDAR-assisted estimation was around 40-60% compared to the pure field survey. We conclude that the previously proposed two-stage model-assisted estimators are inappropriate for surveys with unequal lengths of the LiDAR flight-lines and new estimators are needed. Some options for design of LiDAR-assisted sample surveys under REDD are also discussed, which capitalize on the flexibility offered when the field survey is designed as an integrated part of the overall survey design as opposed to previous LiDAR-assisted sample surveys in the boreal and temperate zones which have been restricted by the current design of an existing NFI.

  11. A Bayesian model for estimating population means using a link-tracing sampling design.

    PubMed

    St Clair, Katherine; O'Connell, Daniel

    2012-03-01

    Link-tracing sampling designs can be used to study human populations that contain "hidden" groups who tend to be linked together by a common social trait. These links can be used to increase the sampling intensity of a hidden domain by tracing links from individuals selected in an initial wave of sampling to additional domain members. Chow and Thompson (2003, Survey Methodology 29, 197-205) derived a Bayesian model to estimate the size or proportion of individuals in the hidden population for certain link-tracing designs. We propose an addition to their model that will allow for the modeling of a quantitative response. We assess properties of our model using a constructed population and a real population of at-risk individuals, both of which contain two domains of hidden and nonhidden individuals. Our results show that our model can produce good point and interval estimates of the population mean and domain means when our population assumptions are satisfied. © 2011, The International Biometric Society.

  12. Small area estimation (SAE) model: Case study of poverty in West Java Province

    NASA Astrophysics Data System (ADS)

    Suhartini, Titin; Sadik, Kusman; Indahwati

    2016-02-01

    This paper showed the comparative of direct estimation and indirect/Small Area Estimation (SAE) model. Model selection included resolve multicollinearity problem in auxiliary variable, such as choosing only variable non-multicollinearity and implemented principal component (PC). Concern parameters in this paper were the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The approach for estimating these parameters could be performed based on direct estimation and SAE. The problem of direct estimation, three area even zero and could not be conducted by directly estimation, because small sample size. The proportion of agricultural venture poor households showed 19.22% and agricultural poor households showed 46.79%. The best model from agricultural venture poor households by choosing only variable non-multicollinearity and the best model from agricultural poor households by implemented PC. The best estimator showed SAE better then direct estimation both of the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The solution overcame small sample size and obtained estimation for small area was implemented small area estimation method for evidence higher accuracy and better precision improved direct estimator.

  13. State-Space Modeling of Dynamic Psychological Processes via the Kalman Smoother Algorithm: Rationale, Finite Sample Properties, and Applications

    ERIC Educational Resources Information Center

    Song, Hairong; Ferrer, Emilio

    2009-01-01

    This article presents a state-space modeling (SSM) technique for fitting process factor analysis models directly to raw data. The Kalman smoother via the expectation-maximization algorithm to obtain maximum likelihood parameter estimates is used. To examine the finite sample properties of the estimates in SSM when common factors are involved, a…

  14. Hidden Markov model for dependent mark loss and survival estimation

    USGS Publications Warehouse

    Laake, Jeffrey L.; Johnson, Devin S.; Diefenbach, Duane R.; Ternent, Mark A.

    2014-01-01

    Mark-recapture estimators assume no loss of marks to provide unbiased estimates of population parameters. We describe a hidden Markov model (HMM) framework that integrates a mark loss model with a Cormack–Jolly–Seber model for survival estimation. Mark loss can be estimated with single-marked animals as long as a sub-sample of animals has a permanent mark. Double-marking provides an estimate of mark loss assuming independence but dependence can be modeled with a permanently marked sub-sample. We use a log-linear approach to include covariates for mark loss and dependence which is more flexible than existing published methods for integrated models. The HMM approach is demonstrated with a dataset of black bears (Ursus americanus) with two ear tags and a subset of which were permanently marked with tattoos. The data were analyzed with and without the tattoo. Dropping the tattoos resulted in estimates of survival that were reduced by 0.005–0.035 due to tag loss dependence that could not be modeled. We also analyzed the data with and without the tattoo using a single tag. By not using.

  15. Increasing precision of turbidity-based suspended sediment concentration and load estimates.

    PubMed

    Jastram, John D; Zipper, Carl E; Zelazny, Lucian W; Hyer, Kenneth E

    2010-01-01

    Turbidity is an effective tool for estimating and monitoring suspended sediments in aquatic systems. Turbidity can be measured in situ remotely and at fine temporal scales as a surrogate for suspended sediment concentration (SSC), providing opportunity for a more complete record of SSC than is possible with physical sampling approaches. However, there is variability in turbidity-based SSC estimates and in sediment loadings calculated from those estimates. This study investigated the potential to improve turbidity-based SSC, and by extension the resulting sediment loading estimates, by incorporating hydrologic variables that can be monitored remotely and continuously (typically 15-min intervals) into the SSC estimation procedure. On the Roanoke River in southwestern Virginia, hydrologic stage, turbidity, and other water-quality parameters were monitored with in situ instrumentation; suspended sediments were sampled manually during elevated turbidity events; samples were analyzed for SSC and physical properties including particle-size distribution and organic C content; and rainfall was quantified by geologic source area. The study identified physical properties of the suspended-sediment samples that contribute to SSC estimation variance and hydrologic variables that explained variability of those physical properties. Results indicated that the inclusion of any of the measured physical properties in turbidity-based SSC estimation models reduces unexplained variance. Further, the use of hydrologic variables to represent these physical properties, along with turbidity, resulted in a model, relying solely on data collected remotely and continuously, that estimated SSC with less variance than a conventional turbidity-based univariate model, allowing a more precise estimate of sediment loading, Modeling results are consistent with known mechanisms governing sediment transport in hydrologic systems.

  16. Slice sampling technique in Bayesian extreme of gold price modelling

    NASA Astrophysics Data System (ADS)

    Rostami, Mohammad; Adam, Mohd Bakri; Ibrahim, Noor Akma; Yahya, Mohamed Hisham

    2013-09-01

    In this paper, a simulation study of Bayesian extreme values by using Markov Chain Monte Carlo via slice sampling algorithm is implemented. We compared the accuracy of slice sampling with other methods for a Gumbel model. This study revealed that slice sampling algorithm offers more accurate and closer estimates with less RMSE than other methods . Finally we successfully employed this procedure to estimate the parameters of Malaysia extreme gold price from 2000 to 2011.

  17. Scalable population estimates using spatial-stream-network (SSN) models, fish density surveys, and national geospatial database frameworks for streams

    Treesearch

    Daniel J. Isaak; Jay M. Ver Hoef; Erin E. Peterson; Dona L. Horan; David E. Nagel

    2017-01-01

    Population size estimates for stream fishes are important for conservation and management, but sampling costs limit the extent of most estimates to small portions of river networks that encompass 100s–10 000s of linear kilometres. However, the advent of large fish density data sets, spatial-stream-network (SSN) models that benefit from nonindependence among samples,...

  18. Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan

    2013-01-01

    The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…

  19. Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

    PubMed

    Waller, Niels G; Feuerstahler, Leah

    2017-01-01

    In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).

  20. Comparing interval estimates for small sample ordinal CFA models

    PubMed Central

    Natesan, Prathiba

    2015-01-01

    Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research. PMID:26579002

  1. Comparing interval estimates for small sample ordinal CFA models.

    PubMed

    Natesan, Prathiba

    2015-01-01

    Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading. Therefore, editors and policymakers should continue to emphasize the inclusion of interval estimates in research.

  2. SEM Based CARMA Time Series Modeling for Arbitrary N.

    PubMed

    Oud, Johan H L; Voelkle, Manuel C; Driver, Charles C

    2018-01-01

    This article explains in detail the state space specification and estimation of first and higher-order autoregressive moving-average models in continuous time (CARMA) in an extended structural equation modeling (SEM) context for N = 1 as well as N > 1. To illustrate the approach, simulations will be presented in which a single panel model (T = 41 time points) is estimated for a sample of N = 1,000 individuals as well as for samples of N = 100 and N = 50 individuals, followed by estimating 100 separate models for each of the one-hundred N = 1 cases in the N = 100 sample. Furthermore, we will demonstrate how to test the difference between the full panel model and each N = 1 model by means of a subject-group-reproducibility test. Finally, the proposed analyses will be applied in an empirical example, in which the relationships between mood at work and mood at home are studied in a sample of N = 55 women. All analyses are carried out by ctsem, an R-package for continuous time modeling, interfacing to OpenMx.

  3. Multi-scale occupancy estimation and modelling using multiple detection methods

    USGS Publications Warehouse

    Nichols, James D.; Bailey, Larissa L.; O'Connell, Allan F.; Talancy, Neil W.; Grant, Evan H. Campbell; Gilbert, Andrew T.; Annand, Elizabeth M.; Husband, Thomas P.; Hines, James E.

    2008-01-01

    Occupancy estimation and modelling based on detection–nondetection data provide an effective way of exploring change in a species’ distribution across time and space in cases where the species is not always detected with certainty. Today, many monitoring programmes target multiple species, or life stages within a species, requiring the use of multiple detection methods. When multiple methods or devices are used at the same sample sites, animals can be detected by more than one method.We develop occupancy models for multiple detection methods that permit simultaneous use of data from all methods for inference about method-specific detection probabilities. Moreover, the approach permits estimation of occupancy at two spatial scales: the larger scale corresponds to species’ use of a sample unit, whereas the smaller scale corresponds to presence of the species at the local sample station or site.We apply the models to data collected on two different vertebrate species: striped skunks Mephitis mephitis and red salamanders Pseudotriton ruber. For striped skunks, large-scale occupancy estimates were consistent between two sampling seasons. Small-scale occupancy probabilities were slightly lower in the late winter/spring when skunks tend to conserve energy, and movements are limited to males in search of females for breeding. There was strong evidence of method-specific detection probabilities for skunks. As anticipated, large- and small-scale occupancy areas completely overlapped for red salamanders. The analyses provided weak evidence of method-specific detection probabilities for this species.Synthesis and applications. Increasingly, many studies are utilizing multiple detection methods at sampling locations. The modelling approach presented here makes efficient use of detections from multiple methods to estimate occupancy probabilities at two spatial scales and to compare detection probabilities associated with different detection methods. The models can be viewed as another variation of Pollock's robust design and may be applicable to a wide variety of scenarios where species occur in an area but are not always near the sampled locations. The estimation approach is likely to be especially useful in multispecies conservation programmes by providing efficient estimates using multiple detection devices and by providing device-specific detection probability estimates for use in survey design.

  4. A Class of Factor Analysis Estimation Procedures with Common Asymptotic Sampling Properties

    ERIC Educational Resources Information Center

    Swain, A. J.

    1975-01-01

    Considers a class of estimation procedures for the factor model. The procedures are shown to yield estimates possessing the same asymptotic sampling properties as those from estimation by maximum likelihood or generalized last squares, both special members of the class. General expressions for the derivatives needed for Newton-Raphson…

  5. On using sample selection methods in estimating the price elasticity of firms' demand for insurance.

    PubMed

    Marquis, M Susan; Louis, Thomas A

    2002-01-01

    We evaluate a technique based on sample selection models that has been used by health economists to estimate the price elasticity of firms' demand for insurance. We demonstrate that, this technique produces inflated estimates of the price elasticity. We show that alternative methods lead to valid estimates.

  6. An evaluation of methods for estimating decadal stream loads

    NASA Astrophysics Data System (ADS)

    Lee, Casey J.; Hirsch, Robert M.; Schwarz, Gregory E.; Holtschlag, David J.; Preston, Stephen D.; Crawford, Charles G.; Vecchia, Aldo V.

    2016-11-01

    Effective management of water resources requires accurate information on the mass, or load of water-quality constituents transported from upstream watersheds to downstream receiving waters. Despite this need, no single method has been shown to consistently provide accurate load estimates among different water-quality constituents, sampling sites, and sampling regimes. We evaluate the accuracy of several load estimation methods across a broad range of sampling and environmental conditions. This analysis uses random sub-samples drawn from temporally-dense data sets of total nitrogen, total phosphorus, nitrate, and suspended-sediment concentration, and includes measurements of specific conductance which was used as a surrogate for dissolved solids concentration. Methods considered include linear interpolation and ratio estimators, regression-based methods historically employed by the U.S. Geological Survey, and newer flexible techniques including Weighted Regressions on Time, Season, and Discharge (WRTDS) and a generalized non-linear additive model. No single method is identified to have the greatest accuracy across all constituents, sites, and sampling scenarios. Most methods provide accurate estimates of specific conductance (used as a surrogate for total dissolved solids or specific major ions) and total nitrogen - lower accuracy is observed for the estimation of nitrate, total phosphorus and suspended sediment loads. Methods that allow for flexibility in the relation between concentration and flow conditions, specifically Beale's ratio estimator and WRTDS, exhibit greater estimation accuracy and lower bias. Evaluation of methods across simulated sampling scenarios indicate that (1) high-flow sampling is necessary to produce accurate load estimates, (2) extrapolation of sample data through time or across more extreme flow conditions reduces load estimate accuracy, and (3) WRTDS and methods that use a Kalman filter or smoothing to correct for departures between individual modeled and observed values benefit most from more frequent water-quality sampling.

  7. An evaluation of methods for estimating decadal stream loads

    USGS Publications Warehouse

    Lee, Casey; Hirsch, Robert M.; Schwarz, Gregory E.; Holtschlag, David J.; Preston, Stephen D.; Crawford, Charles G.; Vecchia, Aldo V.

    2016-01-01

    Effective management of water resources requires accurate information on the mass, or load of water-quality constituents transported from upstream watersheds to downstream receiving waters. Despite this need, no single method has been shown to consistently provide accurate load estimates among different water-quality constituents, sampling sites, and sampling regimes. We evaluate the accuracy of several load estimation methods across a broad range of sampling and environmental conditions. This analysis uses random sub-samples drawn from temporally-dense data sets of total nitrogen, total phosphorus, nitrate, and suspended-sediment concentration, and includes measurements of specific conductance which was used as a surrogate for dissolved solids concentration. Methods considered include linear interpolation and ratio estimators, regression-based methods historically employed by the U.S. Geological Survey, and newer flexible techniques including Weighted Regressions on Time, Season, and Discharge (WRTDS) and a generalized non-linear additive model. No single method is identified to have the greatest accuracy across all constituents, sites, and sampling scenarios. Most methods provide accurate estimates of specific conductance (used as a surrogate for total dissolved solids or specific major ions) and total nitrogen – lower accuracy is observed for the estimation of nitrate, total phosphorus and suspended sediment loads. Methods that allow for flexibility in the relation between concentration and flow conditions, specifically Beale’s ratio estimator and WRTDS, exhibit greater estimation accuracy and lower bias. Evaluation of methods across simulated sampling scenarios indicate that (1) high-flow sampling is necessary to produce accurate load estimates, (2) extrapolation of sample data through time or across more extreme flow conditions reduces load estimate accuracy, and (3) WRTDS and methods that use a Kalman filter or smoothing to correct for departures between individual modeled and observed values benefit most from more frequent water-quality sampling.

  8. Estimation and modeling of electrofishing capture efficiency for fishes in wadeable warmwater streams

    USGS Publications Warehouse

    Price, A.; Peterson, James T.

    2010-01-01

    Stream fish managers often use fish sample data to inform management decisions affecting fish populations. Fish sample data, however, can be biased by the same factors affecting fish populations. To minimize the effect of sample biases on decision making, biologists need information on the effectiveness of fish sampling methods. We evaluated single-pass backpack electrofishing and seining combined with electrofishing by following a dual-gear, mark–recapture approach in 61 blocknetted sample units within first- to third-order streams. We also estimated fish movement out of unblocked units during sampling. Capture efficiency and fish abundances were modeled for 50 fish species by use of conditional multinomial capture–recapture models. The best-approximating models indicated that capture efficiencies were generally low and differed among species groups based on family or genus. Efficiencies of single-pass electrofishing and seining combined with electrofishing were greatest for Catostomidae and lowest for Ictaluridae. Fish body length and stream habitat characteristics (mean cross-sectional area, wood density, mean current velocity, and turbidity) also were related to capture efficiency of both methods, but the effects differed among species groups. We estimated that, on average, 23% of fish left the unblocked sample units, but net movement varied among species. Our results suggest that (1) common warmwater stream fish sampling methods have low capture efficiency and (2) failure to adjust for incomplete capture may bias estimates of fish abundance. We suggest that managers minimize bias from incomplete capture by adjusting data for site- and species-specific capture efficiency and by choosing sampling gear that provide estimates with minimal bias and variance. Furthermore, if block nets are not used, we recommend that managers adjust the data based on unconditional capture efficiency.

  9. A Bayesian Hierarchical Model for Large-Scale Educational Surveys: An Application to the National Assessment of Educational Progress. Research Report. ETS RR-04-38

    ERIC Educational Resources Information Center

    Johnson, Matthew S.; Jenkins, Frank

    2005-01-01

    Large-scale educational assessments such as the National Assessment of Educational Progress (NAEP) sample examinees to whom an exam will be administered. In most situations the sampling design is not a simple random sample and must be accounted for in the estimating model. After reviewing the current operational estimation procedure for NAEP, this…

  10. Seasonal variation in size-dependent survival of juvenile Atlantic salmon (Salmo salar): Performance of multistate capture-mark-recapture models

    USGS Publications Warehouse

    Letcher, B.H.; Horton, G.E.

    2008-01-01

    We estimated the magnitude and shape of size-dependent survival (SDS) across multiple sampling intervals for two cohorts of stream-dwelling Atlantic salmon (Salmo salar) juveniles using multistate capture-mark-recapture (CMR) models. Simulations designed to test the effectiveness of multistate models for detecting SDS in our system indicated that error in SDS estimates was low and that both time-invariant and time-varying SDS could be detected with sample sizes of >250, average survival of >0.6, and average probability of capture of >0.6, except for cases of very strong SDS. In the field (N ??? 750, survival 0.6-0.8 among sampling intervals, probability of capture 0.6-0.8 among sampling occasions), about one-third of the sampling intervals showed evidence of SDS, with poorer survival of larger fish during the age-2+ autumn and quadratic survival (opposite direction between cohorts) during age-1+ spring. The varying magnitude and shape of SDS among sampling intervals suggest a potential mechanism for the maintenance of the very wide observed size distributions. Estimating SDS using multistate CMR models appears complementary to established approaches, can provide estimates with low error, and can be used to detect intermittent SDS. ?? 2008 NRC Canada.

  11. Accuracy or precision: Implications of sample design and methodology on abundance estimation

    USGS Publications Warehouse

    Kowalewski, Lucas K.; Chizinski, Christopher J.; Powell, Larkin A.; Pope, Kevin L.; Pegg, Mark A.

    2015-01-01

    Sampling by spatially replicated counts (point-count) is an increasingly popular method of estimating population size of organisms. Challenges exist when sampling by point-count method, and it is often impractical to sample entire area of interest and impossible to detect every individual present. Ecologists encounter logistical limitations that force them to sample either few large-sample units or many small sample-units, introducing biases to sample counts. We generated a computer environment and simulated sampling scenarios to test the role of number of samples, sample unit area, number of organisms, and distribution of organisms in the estimation of population sizes using N-mixture models. Many sample units of small area provided estimates that were consistently closer to true abundance than sample scenarios with few sample units of large area. However, sample scenarios with few sample units of large area provided more precise abundance estimates than abundance estimates derived from sample scenarios with many sample units of small area. It is important to consider accuracy and precision of abundance estimates during the sample design process with study goals and objectives fully recognized, although and with consequence, consideration of accuracy and precision of abundance estimates is often an afterthought that occurs during the data analysis process.

  12. Formulating the Rasch Differential Item Functioning Model under the Marginal Maximum Likelihood Estimation Context and Its Comparison with Mantel-Haenszel Procedure in Short Test and Small Sample Conditions

    ERIC Educational Resources Information Center

    Paek, Insu; Wilson, Mark

    2011-01-01

    This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…

  13. What is the extent of prokaryotic diversity?

    PubMed Central

    Curtis, Thomas P; Head, Ian M; Lunn, Mary; Woodcock, Stephen; Schloss, Patrick D; Sloan, William T

    2006-01-01

    The extent of microbial diversity is an intrinsically fascinating subject of profound practical importance. The term ‘diversity’ may allude to the number of taxa or species richness as well as their relative abundance. There is uncertainty about both, primarily because sample sizes are too small. Non-parametric diversity estimators make gross underestimates if used with small sample sizes on unevenly distributed communities. One can make richness estimates over many scales using small samples by assuming a species/taxa-abundance distribution. However, no one knows what the underlying taxa-abundance distributions are for bacterial communities. Latterly, diversity has been estimated by fitting data from gene clone libraries and extrapolating from this to taxa-abundance curves to estimate richness. However, since sample sizes are small, we cannot be sure that such samples are representative of the community from which they were drawn. It is however possible to formulate, and calibrate, models that predict the diversity of local communities and of samples drawn from that local community. The calibration of such models suggests that migration rates are small and decrease as the community gets larger. The preliminary predictions of the model are qualitatively consistent with the patterns seen in clone libraries in ‘real life’. The validation of this model is also confounded by small sample sizes. However, if such models were properly validated, they could form invaluable tools for the prediction of microbial diversity and a basis for the systematic exploration of microbial diversity on the planet. PMID:17028084

  14. POWER ANALYSIS FOR COMPLEX MEDIATIONAL DESIGNS USING MONTE CARLO METHODS

    PubMed Central

    Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

    2013-01-01

    Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex mediational models. The approach is based on the well known technique of generating a large number of samples in a Monte Carlo study, and estimating power as the percentage of cases in which an estimate of interest is significantly different from zero. Examples of power calculation for commonly used mediational models are provided. Power analyses for the single mediator, multiple mediators, three-path mediation, mediation with latent variables, moderated mediation, and mediation in longitudinal designs are described. Annotated sample syntax for Mplus is appended and tabled values of required sample sizes are shown for some models. PMID:23935262

  15. Comparing population size estimators for plethodontid salamanders

    USGS Publications Warehouse

    Bailey, L.L.; Simons, T.R.; Pollock, K.H.

    2004-01-01

    Despite concern over amphibian declines, few studies estimate absolute abundances because of logistic and economic constraints and previously poor estimator performance. Two estimation approaches recommended for amphibian studies are mark-recapture and depletion (or removal) sampling. We compared abundance estimation via various mark-recapture and depletion methods, using data from a three-year study of terrestrial salamanders in Great Smoky Mountains National Park. Our results indicate that short-term closed-population, robust design, and depletion methods estimate surface population of salamanders (i.e., those near the surface and available for capture during a given sampling occasion). In longer duration studies, temporary emigration violates assumptions of both open- and closed-population mark-recapture estimation models. However, if the temporary emigration is completely random, these models should yield unbiased estimates of the total population (superpopulation) of salamanders in the sampled area. We recommend using Pollock's robust design in mark-recapture studies because of its flexibility to incorporate variation in capture probabilities and to estimate temporary emigration probabilities.

  16. A revised burial dose estimation procedure for optical dating of youngand modern-age sediments

    USGS Publications Warehouse

    Arnold, L.J.; Roberts, R.G.; Galbraith, R.F.; DeLong, S.B.

    2009-01-01

    The presence of genuinely zero-age or near-zero-age grains in modern-age and very young samples poses a problem for many existing burial dose estimation procedures used in optical (optically stimulated luminescence, OSL) dating. This difficulty currently necessitates consideration of relatively simplistic and statistically inferior age models. In this study, we investigate the potential for using modified versions of the statistical age models of Galbraith et??al. [Galbraith, R.F., Roberts, R.G., Laslett, G.M., Yoshida, H., Olley, J.M., 1999. Optical dating of single and multiple grains of quartz from Jinmium rock shelter, northern Australia: Part I, experimental design and statistical models. Archaeometry 41, 339-364.] to provide reliable equivalent dose (De) estimates for young and modern-age samples that display negative, zero or near-zero De estimates. For this purpose, we have revised the original versions of the central and minimum age models, which are based on log-transformed De values, so that they can be applied to un-logged De estimates and their associated absolute standard errors. The suitability of these 'un-logged' age models is tested using a series of known-age fluvial samples deposited within two arroyo systems from the American Southwest. The un-logged age models provide accurate burial doses and final OSL ages for roughly three-quarters of the total number of samples considered in this study. Sensitivity tests reveal that the un-logged versions of the central and minimum age models are capable of producing accurate burial dose estimates for modern-age and very young (<350??yr) fluvial samples that contain (i) more than 20% of well-bleached grains in their De distributions, or (ii) smaller sub-populations of well-bleached grains for which the De values are known with high precision. Our results indicate that the original (log-transformed) versions of the central and minimum age models are still preferable for most routine dating applications, since these age models are better suited to the statistical properties of typical single-grain and multi-grain single-aliquot De datasets. However, the unique error properties of modern-age samples, combined with the problems of calculating natural logarithms of negative or zero-Gy De values, mean that the un-logged versions of the central and minimum age models currently offer the most suitable means of deriving accurate burial dose estimates for very young and modern-age samples. ?? 2009 Elsevier Ltd. All rights reserved.

  17. Efficient bootstrap estimates for tail statistics

    NASA Astrophysics Data System (ADS)

    Breivik, Øyvind; Aarnes, Ole Johan

    2017-03-01

    Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.

  18. Hierarchical modeling of cluster size in wildlife surveys

    USGS Publications Warehouse

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  19. Unbiased multi-fidelity estimate of failure probability of a free plane jet

    NASA Astrophysics Data System (ADS)

    Marques, Alexandre; Kramer, Boris; Willcox, Karen; Peherstorfer, Benjamin

    2017-11-01

    Estimating failure probability related to fluid flows is a challenge because it requires a large number of evaluations of expensive models. We address this challenge by leveraging multiple low fidelity models of the flow dynamics to create an optimal unbiased estimator. In particular, we investigate the effects of uncertain inlet conditions in the width of a free plane jet. We classify a condition as failure when the corresponding jet width is below a small threshold, such that failure is a rare event (failure probability is smaller than 0.001). We estimate failure probability by combining the frameworks of multi-fidelity importance sampling and optimal fusion of estimators. Multi-fidelity importance sampling uses a low fidelity model to explore the parameter space and create a biasing distribution. An unbiased estimate is then computed with a relatively small number of evaluations of the high fidelity model. In the presence of multiple low fidelity models, this framework offers multiple competing estimators. Optimal fusion combines all competing estimators into a single estimator with minimal variance. We show that this combined framework can significantly reduce the cost of estimating failure probabilities, and thus can have a large impact in fluid flow applications. This work was funded by DARPA.

  20. Estimating a Noncompensatory IRT Model Using Metropolis within Gibbs Sampling

    ERIC Educational Resources Information Center

    Babcock, Ben

    2011-01-01

    Relatively little research has been conducted with the noncompensatory class of multidimensional item response theory (MIRT) models. A Monte Carlo simulation study was conducted exploring the estimation of a two-parameter noncompensatory item response theory (IRT) model. The estimation method used was a Metropolis-Hastings within Gibbs algorithm…

  1. Examining Temporal Sample Scale and Model Choice with Spatial Capture-Recapture Models in the Common Leopard Panthera pardus.

    PubMed

    Goldberg, Joshua F; Tempa, Tshering; Norbu, Nawang; Hebblewhite, Mark; Mills, L Scott; Wangchuk, Tshewang R; Lukacs, Paul

    2015-01-01

    Many large carnivores occupy a wide geographic distribution, and face threats from habitat loss and fragmentation, poaching, prey depletion, and human wildlife-conflicts. Conservation requires robust techniques for estimating population densities and trends, but the elusive nature and low densities of many large carnivores make them difficult to detect. Spatial capture-recapture (SCR) models provide a means for handling imperfect detectability, while linking population estimates to individual movement patterns to provide more accurate estimates than standard approaches. Within this framework, we investigate the effect of different sample interval lengths on density estimates, using simulations and a common leopard (Panthera pardus) model system. We apply Bayesian SCR methods to 89 simulated datasets and camera-trapping data from 22 leopards captured 82 times during winter 2010-2011 in Royal Manas National Park, Bhutan. We show that sample interval length from daily, weekly, monthly or quarterly periods did not appreciably affect median abundance or density, but did influence precision. We observed the largest gains in precision when moving from quarterly to shorter intervals. We therefore recommend daily sampling intervals for monitoring rare or elusive species where practicable, but note that monthly or quarterly sample periods can have similar informative value. We further develop a novel application of Bayes factors to select models where multiple ecological factors are integrated into density estimation. Our simulations demonstrate that these methods can help identify the "true" explanatory mechanisms underlying the data. Using this method, we found strong evidence for sex-specific movement distributions in leopards, suggesting that sexual patterns of space-use influence density. This model estimated a density of 10.0 leopards/100 km2 (95% credibility interval: 6.25-15.93), comparable to contemporary estimates in Asia. These SCR methods provide a guide to monitor and observe the effect of management interventions on leopards and other species of conservation interest.

  2. Examining Temporal Sample Scale and Model Choice with Spatial Capture-Recapture Models in the Common Leopard Panthera pardus

    PubMed Central

    Goldberg, Joshua F.; Tempa, Tshering; Norbu, Nawang; Hebblewhite, Mark; Mills, L. Scott; Wangchuk, Tshewang R.; Lukacs, Paul

    2015-01-01

    Many large carnivores occupy a wide geographic distribution, and face threats from habitat loss and fragmentation, poaching, prey depletion, and human wildlife-conflicts. Conservation requires robust techniques for estimating population densities and trends, but the elusive nature and low densities of many large carnivores make them difficult to detect. Spatial capture-recapture (SCR) models provide a means for handling imperfect detectability, while linking population estimates to individual movement patterns to provide more accurate estimates than standard approaches. Within this framework, we investigate the effect of different sample interval lengths on density estimates, using simulations and a common leopard (Panthera pardus) model system. We apply Bayesian SCR methods to 89 simulated datasets and camera-trapping data from 22 leopards captured 82 times during winter 2010–2011 in Royal Manas National Park, Bhutan. We show that sample interval length from daily, weekly, monthly or quarterly periods did not appreciably affect median abundance or density, but did influence precision. We observed the largest gains in precision when moving from quarterly to shorter intervals. We therefore recommend daily sampling intervals for monitoring rare or elusive species where practicable, but note that monthly or quarterly sample periods can have similar informative value. We further develop a novel application of Bayes factors to select models where multiple ecological factors are integrated into density estimation. Our simulations demonstrate that these methods can help identify the “true” explanatory mechanisms underlying the data. Using this method, we found strong evidence for sex-specific movement distributions in leopards, suggesting that sexual patterns of space-use influence density. This model estimated a density of 10.0 leopards/100 km2 (95% credibility interval: 6.25–15.93), comparable to contemporary estimates in Asia. These SCR methods provide a guide to monitor and observe the effect of management interventions on leopards and other species of conservation interest. PMID:26536231

  3. Score Estimating Equations from Embedded Likelihood Functions under Accelerated Failure Time Model

    PubMed Central

    NING, JING; QIN, JING; SHEN, YU

    2014-01-01

    SUMMARY The semiparametric accelerated failure time (AFT) model is one of the most popular models for analyzing time-to-event outcomes. One appealing feature of the AFT model is that the observed failure time data can be transformed to identically independent distributed random variables without covariate effects. We describe a class of estimating equations based on the score functions for the transformed data, which are derived from the full likelihood function under commonly used semiparametric models such as the proportional hazards or proportional odds model. The methods of estimating regression parameters under the AFT model can be applied to traditional right-censored survival data as well as more complex time-to-event data subject to length-biased sampling. We establish the asymptotic properties and evaluate the small sample performance of the proposed estimators. We illustrate the proposed methods through applications in two examples. PMID:25663727

  4. Estimation of pyrethroid pesticide intake using regression ...

    EPA Pesticide Factsheets

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation of pesticide intakes for a defined demographic community, and (2) comparison of dietary pesticide intakes between the composite and individual samples. Extant databases were useful for assigning individual samples to composites, but they could not provide the breadth of information needed to facilitate measurable levels in every composite. Composite sample measurements were found to be good predictors of pyrethroid pesticide levels in their individual sample constituents where sufficient measurements are available above the method detection limit. Statistical inference shows little evidence of differences between individual and composite measurements and suggests that regression modeling of food groups based on composite dietary samples may provide an effective tool for estimating dietary pesticide intake for a defined population. The research presented in the journal article will improve community's ability to determine exposures through the dietary route with a less burdensome and costly method.

  5. Variability And Uncertainty Analysis Of Contaminant Transport Model Using Fuzzy Latin Hypercube Sampling Technique

    NASA Astrophysics Data System (ADS)

    Kumar, V.; Nayagum, D.; Thornton, S.; Banwart, S.; Schuhmacher2, M.; Lerner, D.

    2006-12-01

    Characterization of uncertainty associated with groundwater quality models is often of critical importance, as for example in cases where environmental models are employed in risk assessment. Insufficient data, inherent variability and estimation errors of environmental model parameters introduce uncertainty into model predictions. However, uncertainty analysis using conventional methods such as standard Monte Carlo sampling (MCS) may not be efficient, or even suitable, for complex, computationally demanding models and involving different nature of parametric variability and uncertainty. General MCS or variant of MCS such as Latin Hypercube Sampling (LHS) assumes variability and uncertainty as a single random entity and the generated samples are treated as crisp assuming vagueness as randomness. Also when the models are used as purely predictive tools, uncertainty and variability lead to the need for assessment of the plausible range of model outputs. An improved systematic variability and uncertainty analysis can provide insight into the level of confidence in model estimates, and can aid in assessing how various possible model estimates should be weighed. The present study aims to introduce, Fuzzy Latin Hypercube Sampling (FLHS), a hybrid approach of incorporating cognitive and noncognitive uncertainties. The noncognitive uncertainty such as physical randomness, statistical uncertainty due to limited information, etc can be described by its own probability density function (PDF); whereas the cognitive uncertainty such estimation error etc can be described by the membership function for its fuzziness and confidence interval by ?-cuts. An important property of this theory is its ability to merge inexact generated data of LHS approach to increase the quality of information. The FLHS technique ensures that the entire range of each variable is sampled with proper incorporation of uncertainty and variability. A fuzzified statistical summary of the model results will produce indices of sensitivity and uncertainty that relate the effects of heterogeneity and uncertainty of input variables to model predictions. The feasibility of the method is validated to assess uncertainty propagation of parameter values for estimation of the contamination level of a drinking water supply well due to transport of dissolved phenolics from a contaminated site in the UK.

  6. Estimation of density of mongooses with capture-recapture and distance sampling

    USGS Publications Warehouse

    Corn, J.L.; Conroy, M.J.

    1998-01-01

    We captured mongooses (Herpestes javanicus) in live traps arranged in trapping webs in Antigua, West Indies, and used capture-recapture and distance sampling to estimate density. Distance estimation and program DISTANCE were used to provide estimates of density from the trapping-web data. Mean density based on trapping webs was 9.5 mongooses/ha (range, 5.9-10.2/ha); estimates had coefficients of variation ranging from 29.82-31.58% (X?? = 30.46%). Mark-recapture models were used to estimate abundance, which was converted to density using estimates of effective trap area. Tests of model assumptions provided by CAPTURE indicated pronounced heterogeneity in capture probabilities and some indication of behavioral response and variation over time. Mean estimated density was 1.80 mongooses/ha (range, 1.37-2.15/ha) with estimated coefficients of variation of 4.68-11.92% (X?? = 7.46%). Estimates of density based on mark-recapture data depended heavily on assumptions about animal home ranges; variances of densities also may be underestimated, leading to unrealistically narrow confidence intervals. Estimates based on trap webs require fewer assumptions, and estimated variances may be a more realistic representation of sampling variation. Because trap webs are established easily and provide adequate data for estimation in a few sample occasions, the method should be efficient and reliable for estimating densities of mongooses.

  7. Nearest neighbor density ratio estimation for large-scale applications in astronomy

    NASA Astrophysics Data System (ADS)

    Kremer, J.; Gieseke, F.; Steenstrup Pedersen, K.; Igel, C.

    2015-09-01

    In astronomical applications of machine learning, the distribution of objects used for building a model is often different from the distribution of the objects the model is later applied to. This is known as sample selection bias, which is a major challenge for statistical inference as one can no longer assume that the labeled training data are representative. To address this issue, one can re-weight the labeled training patterns to match the distribution of unlabeled data that are available already in the training phase. There are many examples in practice where this strategy yielded good results, but estimating the weights reliably from a finite sample is challenging. We consider an efficient nearest neighbor density ratio estimator that can exploit large samples to increase the accuracy of the weight estimates. To solve the problem of choosing the right neighborhood size, we propose to use cross-validation on a model selection criterion that is unbiased under covariate shift. The resulting algorithm is our method of choice for density ratio estimation when the feature space dimensionality is small and sample sizes are large. The approach is simple and, because of the model selection, robust. We empirically find that it is on a par with established kernel-based methods on relatively small regression benchmark datasets. However, when applied to large-scale photometric redshift estimation, our approach outperforms the state-of-the-art.

  8. Estimating parameters of hidden Markov models based on marked individuals: use of robust design data

    USGS Publications Warehouse

    Kendall, William L.; White, Gary C.; Hines, James E.; Langtimm, Catherine A.; Yoshizaki, Jun

    2012-01-01

    Development and use of multistate mark-recapture models, which provide estimates of parameters of Markov processes in the face of imperfect detection, have become common over the last twenty years. Recently, estimating parameters of hidden Markov models, where the state of an individual can be uncertain even when it is detected, has received attention. Previous work has shown that ignoring state uncertainty biases estimates of survival and state transition probabilities, thereby reducing the power to detect effects. Efforts to adjust for state uncertainty have included special cases and a general framework for a single sample per period of interest. We provide a flexible framework for adjusting for state uncertainty in multistate models, while utilizing multiple sampling occasions per period of interest to increase precision and remove parameter redundancy. These models also produce direct estimates of state structure for each primary period, even for the case where there is just one sampling occasion. We apply our model to expected value data, and to data from a study of Florida manatees, to provide examples of the improvement in precision due to secondary capture occasions. We also provide user-friendly software to implement these models. This general framework could also be used by practitioners to consider constrained models of particular interest, or model the relationship between within-primary period parameters (e.g., state structure) and between-primary period parameters (e.g., state transition probabilities).

  9. Estimating the Uncertainty In Diameter Growth Model Predictions and Its Effects On The Uncertainty of Annual Inventory Estimates

    Treesearch

    Ronald E. McRoberts; Veronica C. Lessard

    2001-01-01

    Uncertainty in diameter growth predictions is attributed to three general sources: measurement error or sampling variability in predictor variables, parameter covariances, and residual or unexplained variation around model expectations. Using measurement error and sampling variability distributions obtained from the literature and Monte Carlo simulation methods, the...

  10. Estimating Animal Abundance in Ground Beef Batches Assayed with Molecular Markers

    PubMed Central

    Hu, Xin-Sheng; Simila, Janika; Platz, Sindey Schueler; Moore, Stephen S.; Plastow, Graham; Meghen, Ciaran N.

    2012-01-01

    Estimating animal abundance in industrial scale batches of ground meat is important for mapping meat products through the manufacturing process and for effectively tracing the finished product during a food safety recall. The processing of ground beef involves a potentially large number of animals from diverse sources in a single product batch, which produces a high heterogeneity in capture probability. In order to estimate animal abundance through DNA profiling of ground beef constituents, two parameter-based statistical models were developed for incidence data. Simulations were applied to evaluate the maximum likelihood estimate (MLE) of a joint likelihood function from multiple surveys, showing superiority in the presence of high capture heterogeneity with small sample sizes, or comparable estimation in the presence of low capture heterogeneity with a large sample size when compared to other existing models. Our model employs the full information on the pattern of the capture-recapture frequencies from multiple samples. We applied the proposed models to estimate animal abundance in six manufacturing beef batches, genotyped using 30 single nucleotide polymorphism (SNP) markers, from a large scale beef grinding facility. Results show that between 411∼1367 animals were present in six manufacturing beef batches. These estimates are informative as a reference for improving recall processes and tracing finished meat products back to source. PMID:22479559

  11. On estimation of linear transformation models with nested case–control sampling

    PubMed Central

    Liu, Mengling

    2011-01-01

    Nested case–control (NCC) sampling is widely used in large epidemiological cohort studies for its cost effectiveness, but its data analysis primarily relies on the Cox proportional hazards model. In this paper, we consider a family of linear transformation models for analyzing NCC data and propose an inverse selection probability weighted estimating equation method for inference. Consistency and asymptotic normality of our estimators for regression coefficients are established. We show that the asymptotic variance has a closed analytic form and can be easily estimated. Numerical studies are conducted to support the theory and an application to the Wilms’ Tumor Study is also given to illustrate the methodology. PMID:21912975

  12. Heterogeneous autoregressive model with structural break using nearest neighbor truncation volatility estimators for DAX.

    PubMed

    Chin, Wen Cheong; Lee, Min Cherng; Yap, Grace Lee Ching

    2016-01-01

    High frequency financial data modelling has become one of the important research areas in the field of financial econometrics. However, the possible structural break in volatile financial time series often trigger inconsistency issue in volatility estimation. In this study, we propose a structural break heavy-tailed heterogeneous autoregressive (HAR) volatility econometric model with the enhancement of jump-robust estimators. The breakpoints in the volatility are captured by dummy variables after the detection by Bai-Perron sequential multi breakpoints procedure. In order to further deal with possible abrupt jump in the volatility, the jump-robust volatility estimators are composed by using the nearest neighbor truncation approach, namely the minimum and median realized volatility. Under the structural break improvements in both the models and volatility estimators, the empirical findings show that the modified HAR model provides the best performing in-sample and out-of-sample forecast evaluations as compared with the standard HAR models. Accurate volatility forecasts have direct influential to the application of risk management and investment portfolio analysis.

  13. Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

    PubMed Central

    Huang, Lei

    2015-01-01

    To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409

  14. A robust design mark-resight abundance estimator allowing heterogeneity in resighting probabilities

    USGS Publications Warehouse

    McClintock, B.T.; White, Gary C.; Burnham, K.P.

    2006-01-01

    This article introduces the beta-binomial estimator (BBE), a closed-population abundance mark-resight model combining the favorable qualities of maximum likelihood theory and the allowance of individual heterogeneity in sighting probability (p). The model may be parameterized for a robust sampling design consisting of multiple primary sampling occasions where closure need not be met between primary occasions. We applied the model to brown bear data from three study areas in Alaska and compared its performance to the joint hypergeometric estimator (JHE) and Bowden's estimator (BOWE). BBE estimates suggest heterogeneity levels were non-negligible and discourage the use of JHE for these data. Compared to JHE and BOWE, confidence intervals were considerably shorter for the AICc model-averaged BBE. To evaluate the properties of BBE relative to JHE and BOWE when sample sizes are small, simulations were performed with data from three primary occasions generated under both individual heterogeneity and temporal variation in p. All models remained consistent regardless of levels of variation in p. In terms of precision, the AICc model-averaged BBE showed advantages over JHE and BOWE when heterogeneity was present and mean sighting probabilities were similar between primary occasions. Based on the conditions examined, BBE is a reliable alternative to JHE or BOWE and provides a framework for further advances in mark-resight abundance estimation. ?? 2006 American Statistical Association and the International Biometric Society.

  15. Moments and Root-Mean-Square Error of the Bayesian MMSE Estimator of Classification Error in the Gaussian Model.

    PubMed

    Zollanvari, Amin; Dougherty, Edward R

    2014-06-01

    The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.

  16. Comparison of sampling techniques for Bayesian parameter estimation

    NASA Astrophysics Data System (ADS)

    Allison, Rupert; Dunkley, Joanna

    2014-02-01

    The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.

  17. Unscented predictive variable structure filter for satellite attitude estimation with model errors when using low precision sensors

    NASA Astrophysics Data System (ADS)

    Cao, Lu; Li, Hengnian

    2016-10-01

    For the satellite attitude estimation problem, the serious model errors always exist and hider the estimation performance of the Attitude Determination and Control System (ACDS), especially for a small satellite with low precision sensors. To deal with this problem, a new algorithm for the attitude estimation, referred to as the unscented predictive variable structure filter (UPVSF) is presented. This strategy is proposed based on the variable structure control concept and unscented transform (UT) sampling method. It can be implemented in real time with an ability to estimate the model errors on-line, in order to improve the state estimation precision. In addition, the model errors in this filter are not restricted only to the Gaussian noises; therefore, it has the advantages to deal with the various kinds of model errors or noises. It is anticipated that the UT sampling strategy can further enhance the robustness and accuracy of the novel UPVSF. Numerical simulations show that the proposed UPVSF is more effective and robustness in dealing with the model errors and low precision sensors compared with the traditional unscented Kalman filter (UKF).

  18. Drug-drug interaction predictions with PBPK models and optimal multiresponse sampling time designs: application to midazolam and a phase I compound. Part 1: comparison of uniresponse and multiresponse designs using PopDes.

    PubMed

    Chenel, Marylore; Bouzom, François; Aarons, Leon; Ogungbenro, Kayode

    2008-12-01

    To determine the optimal sampling time design of a drug-drug interaction (DDI) study for the estimation of apparent clearances (CL/F) of two co-administered drugs (SX, a phase I compound, potentially a CYP3A4 inhibitor, and MDZ, a reference CYP3A4 substrate) without any in vivo data using physiologically based pharmacokinetic (PBPK) predictions, population PK modelling and multiresponse optimal design. PBPK models were developed with AcslXtreme using only in vitro data to simulate PK profiles of both drugs when they were co-administered. Then, using simulated data, population PK models were developed with NONMEM and optimal sampling times were determined by optimizing the determinant of the population Fisher information matrix with PopDes using either two uniresponse designs (UD) or a multiresponse design (MD) with joint sampling times for both drugs. Finally, the D-optimal sampling time designs were evaluated by simulation and re-estimation with NONMEM by computing the relative root mean squared error (RMSE) and empirical relative standard errors (RSE) of CL/F. There were four and five optimal sampling times (=nine different sampling times) in the UDs for SX and MDZ, respectively, whereas there were only five sampling times in the MD. Whatever design and compound, CL/F was well estimated (RSE < 20% for MDZ and <25% for SX) and expected RSEs from PopDes were in the same range as empirical RSEs. Moreover, there was no bias in CL/F estimation. Since MD required only five sampling times compared to the two UDs, D-optimal sampling times of the MD were included into a full empirical design for the proposed clinical trial. A joint paper compares the designs with real data. This global approach including PBPK simulations, population PK modelling and multiresponse optimal design allowed, without any in vivo data, the design of a clinical trial, using sparse sampling, capable of estimating CL/F of the CYP3A4 substrate and potential inhibitor when co-administered together.

  19. [Estimating heavy metal concentrations in topsoil from vegetation reflectance spectra of Hyperion images: A case study of Yushu County, Qinghai, China.

    PubMed

    Yang, Ling Yu; Gao, Xiao Hong; Zhang, Wei; Shi, Fei Fei; He, Lin Hua; Jia, Wei

    2016-06-01

    In this study, we explored the feasibility of estimating the soil heavy metal concentrations using the hyperspectral satellite image. The concentration of As, Pb, Zn and Cd elements in 48 topsoil samples collected from the field in Yushu County of the Sanjiangyuan regions was measured in the laboratory. We then extracted 176 vegetation spectral reflectance bands of 48 soil samples as well as five vegetation indices from two Hyperion images. Following that, the partial least squares regression (PLSR) method was employed to estimate the soil heavy metal concentrations using the above two independent sets of Hyperion-derived variables, separately constructed the estimation model between the 176 vegetation spectral reflectance bands and the soil heavy metal concentrations (called the vegetation spectral reflectance-based estimation model), and between the five vegetation indices being used as the independent variable and the soil heavy metal concentrations (called synthetic vegetation index-based estimation model). Using RPD (the ratio of standard deviation from the 4 heavy metals measured values of the validation samples to RMSE) as the validation criteria, the RPDs of As and Pb concentrations from the two models were both less than 1.4, which suggested that both models were incapable of roughly estimating As and Pb concentrations; whereas the RPDs of Zn and Cd were 1.53, 1.46 and 1.46, 1.42, respectively, which implied that both models had the ability for rough estimation of Zn and Cd concentrations. Based on those results, the vegetation spectral-based estimation model was selected to obtain the spatial distribution map of Zn concentration in combination with the Hyperion image. The estimated Zn map showed that the zones with high Zn concentrations were distributed near the provincial road 308, national road 214 and towns, which could be influenced by human activities. Our study proved that the spectral reflectance of Hyperion image was useful in estimating the soil concentrations of Zn and Cd.

  20. Variance Estimation Using Replication Methods in Structural Equation Modeling with Complex Sample Data

    ERIC Educational Resources Information Center

    Stapleton, Laura M.

    2008-01-01

    This article discusses replication sampling variance estimation techniques that are often applied in analyses using data from complex sampling designs: jackknife repeated replication, balanced repeated replication, and bootstrapping. These techniques are used with traditional analyses such as regression, but are currently not used with structural…

  1. Evaluation of procedures for estimating ruminal particle turnover and diet digestibility in ruminant animals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cochran, R.C.

    1985-01-01

    Procedures used in estimating ruminal particle turnover and diet digestibility were evaluated in a series of independent experiments. Experiment 1 and 2 evaluated the influence of sampling site, mathematical model and intraruminal mixing on estimates of ruminal particle turnover in beef steers grazing crested wheatgrass or offered ad libitum levels of prairie hay once daily, respectively. Particle turnover rate constants were estimated by intraruminal administration (via rumen cannula) of ytterbium (Yb)-labeled forage, followed by serial collection of rumen digesta or fecal samples. Rumen Yb concentrations were transformed to natural logarithms and regressed on time. Influence of sampling site (rectum versusmore » rumen) on turnover estimates was modified by the model used to fit fecal marker excretion curves in the grazing study. In contrast, estimated turnover rate constants from rumen sampling were smaller (P < 0.05) than rectally derived rate constants, regardless of fecal model used, when steers were fed once daily. In Experiment 3, in vitro residues subjected to acid or neutral detergent fiber extraction (IVADF and IVNDF), acid detergent fiber incubated in cellulase (ADFIC) and acid detergent lignin (ADL) were evaluated as internal markers for predicting diet digestibility. Both IVADF and IVNDF displayed variable accuracy for prediction of in vivo digestibility whereas ADL and ADFIC inaccurately predicted digestibility of all diets.« less

  2. Sworn testimony of the model evidence: Gaussian Mixture Importance (GAME) sampling

    NASA Astrophysics Data System (ADS)

    Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

    2017-07-01

    What is the "best" model? The answer to this question lies in part in the eyes of the beholder, nevertheless a good model must blend rigorous theory with redeeming qualities such as parsimony and quality of fit. Model selection is used to make inferences, via weighted averaging, from a set of K candidate models, Mk; k=>(1,…,K>), and help identify which model is most supported by the observed data, Y>˜=>(y˜1,…,y˜n>). Here, we introduce a new and robust estimator of the model evidence, p>(Y>˜|Mk>), which acts as normalizing constant in the denominator of Bayes' theorem and provides a single quantitative measure of relative support for each hypothesis that integrates model accuracy, uncertainty, and complexity. However, p>(Y>˜|Mk>) is analytically intractable for most practical modeling problems. Our method, coined GAussian Mixture importancE (GAME) sampling, uses bridge sampling of a mixture distribution fitted to samples of the posterior model parameter distribution derived from MCMC simulation. We benchmark the accuracy and reliability of GAME sampling by application to a diverse set of multivariate target distributions (up to 100 dimensions) with known values of p>(Y>˜|Mk>) and to hypothesis testing using numerical modeling of the rainfall-runoff transformation of the Leaf River watershed in Mississippi, USA. These case studies demonstrate that GAME sampling provides robust and unbiased estimates of the evidence at a relatively small computational cost outperforming commonly used estimators. The GAME sampler is implemented in the MATLAB package of DREAM and simplifies considerably scientific inquiry through hypothesis testing and model selection.

  3. Comparison of sampling designs for estimating deforestation from landsat TM and MODIS imagery: a case study in Mato Grosso, Brazil.

    PubMed

    Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin

    2014-01-01

    Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block.

  4. Comparison of Sampling Designs for Estimating Deforestation from Landsat TM and MODIS Imagery: A Case Study in Mato Grosso, Brazil

    PubMed Central

    Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin

    2014-01-01

    Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block. PMID:25258742

  5. A class of Box-Cox transformation models for recurrent event data.

    PubMed

    Sun, Liuquan; Tong, Xingwei; Zhou, Xian

    2011-04-01

    In this article, we propose a class of Box-Cox transformation models for recurrent event data, which includes the proportional means models as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the proposed models, we apply a profile pseudo-partial likelihood method to estimate the model parameters via estimating equation approaches and establish large sample properties of the estimators and examine its performance in moderate-sized samples through simulation studies. In addition, some graphical and numerical procedures are presented for model checking. An example of application on a set of multiple-infection data taken from a clinic study on chronic granulomatous disease (CGD) is also illustrated.

  6. A log-linear model approach to estimation of population size using the line-transect sampling method

    USGS Publications Warehouse

    Anderson, D.R.; Burnham, K.P.; Crain, B.R.

    1978-01-01

    The technique of estimating wildlife population size and density using the belt or line-transect sampling method has been used in many past projects, such as the estimation of density of waterfowl nestling sites in marshes, and is being used currently in such areas as the assessment of Pacific porpoise stocks in regions of tuna fishing activity. A mathematical framework for line-transect methodology has only emerged in the last 5 yr. In the present article, we extend this mathematical framework to a line-transect estimator based upon a log-linear model approach.

  7. Linear models for airborne-laser-scanning-based operational forest inventory with small field sample size and highly correlated LiDAR data

    USGS Publications Warehouse

    Junttila, Virpi; Kauranne, Tuomo; Finley, Andrew O.; Bradford, John B.

    2015-01-01

    Modern operational forest inventory often uses remotely sensed data that cover the whole inventory area to produce spatially explicit estimates of forest properties through statistical models. The data obtained by airborne light detection and ranging (LiDAR) correlate well with many forest inventory variables, such as the tree height, the timber volume, and the biomass. To construct an accurate model over thousands of hectares, LiDAR data must be supplemented with several hundred field sample measurements of forest inventory variables. This can be costly and time consuming. Different LiDAR-data-based and spatial-data-based sampling designs can reduce the number of field sample plots needed. However, problems arising from the features of the LiDAR data, such as a large number of predictors compared with the sample size (overfitting) or a strong correlation among predictors (multicollinearity), may decrease the accuracy and precision of the estimates and predictions. To overcome these problems, a Bayesian linear model with the singular value decomposition of predictors, combined with regularization, is proposed. The model performance in predicting different forest inventory variables is verified in ten inventory areas from two continents, where the number of field sample plots is reduced using different sampling designs. The results show that, with an appropriate field plot selection strategy and the proposed linear model, the total relative error of the predicted forest inventory variables is only 5%–15% larger using 50 field sample plots than the error of a linear model estimated with several hundred field sample plots when we sum up the error due to both the model noise variance and the model’s lack of fit.

  8. Sampling design for groundwater solute transport: Tests of methods and analysis of Cape Cod tracer test data

    USGS Publications Warehouse

    Knopman, Debra S.; Voss, Clifford I.; Garabedian, Stephen P.

    1991-01-01

    Tests of a one-dimensional sampling design methodology on measurements of bromide concentration collected during the natural gradient tracer test conducted by the U.S. Geological Survey on Cape Cod, Massachusetts, demonstrate its efficacy for field studies of solute transport in groundwater and the utility of one-dimensional analysis. The methodology was applied to design of sparse two-dimensional networks of fully screened wells typical of those often used in engineering practice. In one-dimensional analysis, designs consist of the downstream distances to rows of wells oriented perpendicular to the groundwater flow direction and the timing of sampling to be carried out on each row. The power of a sampling design is measured by its effectiveness in simultaneously meeting objectives of model discrimination, parameter estimation, and cost minimization. One-dimensional models of solute transport, differing in processes affecting the solute and assumptions about the structure of the flow field, were considered for description of tracer cloud migration. When fitting each model using nonlinear regression, additive and multiplicative error forms were allowed for the residuals which consist of both random and model errors. The one-dimensional single-layer model of a nonreactive solute with multiplicative error was judged to be the best of those tested. Results show the efficacy of the methodology in designing sparse but powerful sampling networks. Designs that sample five rows of wells at five or fewer times in any given row performed as well for model discrimination as the full set of samples taken up to eight times in a given row from as many as 89 rows. Also, designs for parameter estimation judged to be good by the methodology were as effective in reducing the variance of parameter estimates as arbitrary designs with many more samples. Results further showed that estimates of velocity and longitudinal dispersivity in one-dimensional models based on data from only five rows of fully screened wells each sampled five or fewer times were practically equivalent to values determined from moments analysis of the complete three-dimensional set of 29,285 samples taken during 16 sampling times.

  9. Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.

    PubMed

    Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R

    2018-05-26

    Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. A Simple Model to Identify Risk of Sarcopenia and Physical Disability in HIV-Infected Patients.

    PubMed

    Farinatti, Paulo; Paes, Lorena; Harris, Elizabeth A; Lopes, Gabriella O; Borges, Juliana P

    2017-09-01

    Farinatti, P, Paes, L, Harris, EA, Lopes, GO, and Borges, JP. A simple model to identify risk of sarcopenia and physical disability in HIV-infected patients. J Strength Cond Res 31(9): 2542-2551, 2017-Early detection of sarcopenia might help preventing muscle loss and disability in HIV-infected patients. This study proposed a model for estimating appendicular skeletal muscle mass (ASM) to calculate indices to identify "sarcopenia" (SA) and "risk for disability due to sarcopenia" (RSA) in patients with HIV. An equation to estimate ASM was developed in 56 patients (47.2 ± 6.9 years), with a cross-validation sample of 24 patients (48.1 ± 6.6 years). The model validity was determined by calculating, in both samples: (a) Concordance between actual vs. estimated ASM; (b) Correlations between actual/estimated ASM vs. peak torque (PT) and total work (TW) during isokinetic knee extension/flexion; (c) Agreement of patients classified with SA and RSA. The predictive equation was ASM (kg) = 7.77 (sex; F = 0/M = 1) + 0.26 (arm circumference; cm) + 0.38 (thigh circumference; cm) + 0.03 (Body Mass Index; kg·m) - 8.94 (R = 0.74; Radj = 0.72; SEE = 3.13 kg). Agreement between actual vs. estimated ASM was confirmed in validation (t = 0.081/p = 0.94; R = 0.86/p < 0.0001) and cross-validation (t = 0.12/p = 0.92; R = 0.87/p < 0.0001) samples. Regression characteristics in cross-validation sample (Radj = 0.80; SEE = 3.65) and PRESS (RPRESS = 0.69; SEEPRESS = 3.35) were compatible with the original model. Percent agreements for the classification of SA and RSA from indices calculated using actual and estimated ASM were of 87.5% and 77.2% (gamma correlations 0.72-1.0; p < 0.04) in validation, and 95.8% and 75.0% (gamma correlations 0.98-0.97; p < 0.001) in cross-validation sample, respectively. Correlations between actual/estimated ASM vs. PT (range 0.50-0.73, p ≤ 0.05) and TW (range 0.59-0.74, p ≤ 0.05) were similar in both samples. In conclusion, our model correctly estimated ASM to determine indices for identifying SA and RSA in HIV-infected patients.

  11. Spatially explicit dynamic N-mixture models

    USGS Publications Warehouse

    Zhao, Qing; Royle, Andy; Boomer, G. Scott

    2017-01-01

    Knowledge of demographic parameters such as survival, reproduction, emigration, and immigration is essential to understand metapopulation dynamics. Traditionally the estimation of these demographic parameters requires intensive data from marked animals. The development of dynamic N-mixture models makes it possible to estimate demographic parameters from count data of unmarked animals, but the original dynamic N-mixture model does not distinguish emigration and immigration from survival and reproduction, limiting its ability to explain important metapopulation processes such as movement among local populations. In this study we developed a spatially explicit dynamic N-mixture model that estimates survival, reproduction, emigration, local population size, and detection probability from count data under the assumption that movement only occurs among adjacent habitat patches. Simulation studies showed that the inference of our model depends on detection probability, local population size, and the implementation of robust sampling design. Our model provides reliable estimates of survival, reproduction, and emigration when detection probability is high, regardless of local population size or the type of sampling design. When detection probability is low, however, our model only provides reliable estimates of survival, reproduction, and emigration when local population size is moderate to high and robust sampling design is used. A sensitivity analysis showed that our model is robust against the violation of the assumption that movement only occurs among adjacent habitat patches, suggesting wide applications of this model. Our model can be used to improve our understanding of metapopulation dynamics based on count data that are relatively easy to collect in many systems.

  12. Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

    PubMed

    Taylor, Sandra L; Leiserowitz, Gary S; Kim, Kyoungmi

    2013-12-01

    Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

  13. A Comparison of the Spatial Linear Model to Nearest Neighbor (k-NN) Methods for Forestry Applications

    Treesearch

    Jay M. Ver Hoef; Hailemariam Temesgen; Sergio Gómez

    2013-01-01

    Forest surveys provide critical information for many diverse interests. Data are often collected from samples, and from these samples, maps of resources and estimates of aerial totals or averages are required. In this paper, two approaches for mapping and estimating totals; the spatial linear model (SLM) and k-NN (k-Nearest Neighbor) are compared, theoretically,...

  14. Estimation of Logistic Regression Models in Small Samples. A Simulation Study Using a Weakly Informative Default Prior Distribution

    ERIC Educational Resources Information Center

    Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel

    2012-01-01

    In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…

  15. A three stage sampling model for remote sensing applications

    NASA Technical Reports Server (NTRS)

    Eisgruber, L. M.

    1972-01-01

    A conceptual model and an empirical application of the relationship between the manner of selecting observations and its effect on the precision of estimates from remote sensing are reported. This three stage sampling scheme considers flightlines, segments within flightlines, and units within these segments. The error of estimate is dependent on the number of observations in each of the stages.

  16. Time Delay Embedding Increases Estimation Precision of Models of Intraindividual Variability

    ERIC Educational Resources Information Center

    von Oertzen, Timo; Boker, Steven M.

    2010-01-01

    This paper investigates the precision of parameters estimated from local samples of time dependent functions. We find that "time delay embedding," i.e., structuring data prior to analysis by constructing a data matrix of overlapping samples, increases the precision of parameter estimates and in turn statistical power compared to standard…

  17. Spatial design and strength of spatial signal: Effects on covariance estimation

    USGS Publications Warehouse

    Irvine, Kathryn M.; Gitelman, Alix I.; Hoeting, Jennifer A.

    2007-01-01

    In a spatial regression context, scientists are often interested in a physical interpretation of components of the parametric covariance function. For example, spatial covariance parameter estimates in ecological settings have been interpreted to describe spatial heterogeneity or “patchiness” in a landscape that cannot be explained by measured covariates. In this article, we investigate the influence of the strength of spatial dependence on maximum likelihood (ML) and restricted maximum likelihood (REML) estimates of covariance parameters in an exponential-with-nugget model, and we also examine these influences under different sampling designs—specifically, lattice designs and more realistic random and cluster designs—at differing intensities of sampling (n=144 and 361). We find that neither ML nor REML estimates perform well when the range parameter and/or the nugget-to-sill ratio is large—ML tends to underestimate the autocorrelation function and REML produces highly variable estimates of the autocorrelation function. The best estimates of both the covariance parameters and the autocorrelation function come under the cluster sampling design and large sample sizes. As a motivating example, we consider a spatial model for stream sulfate concentration.

  18. Effects of model complexity and priors on estimation using sequential importance sampling/resampling for species conservation

    USGS Publications Warehouse

    Dunham, Kylee; Grand, James B.

    2016-01-01

    We examined the effects of complexity and priors on the accuracy of models used to estimate ecological and observational processes, and to make predictions regarding population size and structure. State-space models are useful for estimating complex, unobservable population processes and making predictions about future populations based on limited data. To better understand the utility of state space models in evaluating population dynamics, we used them in a Bayesian framework and compared the accuracy of models with differing complexity, with and without informative priors using sequential importance sampling/resampling (SISR). Count data were simulated for 25 years using known parameters and observation process for each model. We used kernel smoothing to reduce the effect of particle depletion, which is common when estimating both states and parameters with SISR. Models using informative priors estimated parameter values and population size with greater accuracy than their non-informative counterparts. While the estimates of population size and trend did not suffer greatly in models using non-informative priors, the algorithm was unable to accurately estimate demographic parameters. This model framework provides reasonable estimates of population size when little to no information is available; however, when information on some vital rates is available, SISR can be used to obtain more precise estimates of population size and process. Incorporating model complexity such as that required by structured populations with stage-specific vital rates affects precision and accuracy when estimating latent population variables and predicting population dynamics. These results are important to consider when designing monitoring programs and conservation efforts requiring management of specific population segments.

  19. Inference about density and temporary emigration in unmarked populations

    USGS Publications Warehouse

    Chandler, Richard B.; Royle, J. Andrew; King, David I.

    2011-01-01

    Few species are distributed uniformly in space, and populations of mobile organisms are rarely closed with respect to movement, yet many models of density rely upon these assumptions. We present a hierarchical model allowing inference about the density of unmarked populations subject to temporary emigration and imperfect detection. The model can be fit to data collected using a variety of standard survey methods such as repeated point counts in which removal sampling, double-observer sampling, or distance sampling is used during each count. Simulation studies demonstrated that parameter estimators are unbiased when temporary emigration is either "completely random" or is determined by the size and location of home ranges relative to survey points. We also applied the model to repeated removal sampling data collected on Chestnut-sided Warblers (Dendroica pensylvancia) in the White Mountain National Forest, USA. The density estimate from our model, 1.09 birds/ha, was similar to an estimate of 1.11 birds/ha produced by an intensive spot-mapping effort. Our model is also applicable when processes other than temporary emigration affect the probability of being available for detection, such as in studies using cue counts. Functions to implement the model have been added to the R package unmarked.

  20. Population Pharmacokinetics and Optimal Sampling Strategy for Model-Based Precision Dosing of Melphalan in Patients Undergoing Hematopoietic Stem Cell Transplantation.

    PubMed

    Mizuno, Kana; Dong, Min; Fukuda, Tsuyoshi; Chandra, Sharat; Mehta, Parinda A; McConnell, Scott; Anaissie, Elias J; Vinks, Alexander A

    2018-05-01

    High-dose melphalan is an important component of conditioning regimens for patients undergoing hematopoietic stem cell transplantation. The current dosing strategy based on body surface area results in a high incidence of oral mucositis and gastrointestinal and liver toxicity. Pharmacokinetically guided dosing will individualize exposure and help minimize overexposure-related toxicity. The purpose of this study was to develop a population pharmacokinetic model and optimal sampling strategy. A population pharmacokinetic model was developed with NONMEM using 98 observations collected from 15 adult patients given the standard dose of 140 or 200 mg/m 2 by intravenous infusion. The determinant-optimal sampling strategy was explored with PopED software. Individual area under the curve estimates were generated by Bayesian estimation using full and the proposed sparse sampling data. The predictive performance of the optimal sampling strategy was evaluated based on bias and precision estimates. The feasibility of the optimal sampling strategy was tested using pharmacokinetic data from five pediatric patients. A two-compartment model best described the data. The final model included body weight and creatinine clearance as predictors of clearance. The determinant-optimal sampling strategies (and windows) were identified at 0.08 (0.08-0.19), 0.61 (0.33-0.90), 2.0 (1.3-2.7), and 4.0 (3.6-4.0) h post-infusion. An excellent correlation was observed between area under the curve estimates obtained with the full and the proposed four-sample strategy (R 2  = 0.98; p < 0.01) with a mean bias of -2.2% and precision of 9.4%. A similar relationship was observed in children (R 2  = 0.99; p < 0.01). The developed pharmacokinetic model-based sparse sampling strategy promises to achieve the target area under the curve as part of precision dosing.

  1. Estimating juvenile Chinook salmon (Oncorhynchus tshawytscha) abundance from beach seine data collected in the Sacramento–San Joaquin Delta and San Francisco Bay, California

    USGS Publications Warehouse

    Perry, Russell W.; Kirsch, Joseph E.; Hendrix, A. Noble

    2016-06-17

    Resource managers rely on abundance or density metrics derived from beach seine surveys to make vital decisions that affect fish population dynamics and assemblage structure. However, abundance and density metrics may be biased by imperfect capture and lack of geographic closure during sampling. Currently, there is considerable uncertainty about the capture efficiency of juvenile Chinook salmon (Oncorhynchus tshawytscha) by beach seines. Heterogeneity in capture can occur through unrealistic assumptions of closure and from variation in the probability of capture caused by environmental conditions. We evaluated the assumptions of closure and the influence of environmental conditions on capture efficiency and abundance estimates of Chinook salmon from beach seining within the Sacramento–San Joaquin Delta and the San Francisco Bay. Beach seine capture efficiency was measured using a stratified random sampling design combined with open and closed replicate depletion sampling. A total of 56 samples were collected during the spring of 2014. To assess variability in capture probability and the absolute abundance of juvenile Chinook salmon, beach seine capture efficiency data were fitted to the paired depletion design using modified N-mixture models. These models allowed us to explicitly test the closure assumption and estimate environmental effects on the probability of capture. We determined that our updated method allowing for lack of closure between depletion samples drastically outperformed traditional data analysis that assumes closure among replicate samples. The best-fit model (lowest-valued Akaike Information Criterion model) included the probability of fish being available for capture (relaxed closure assumption), capture probability modeled as a function of water velocity and percent coverage of fine sediment, and abundance modeled as a function of sample area, temperature, and water velocity. Given that beach seining is a ubiquitous sampling technique for many species, our improved sampling design and analysis could provide significant improvements in density and abundance estimation.

  2. Noninvasive estimation of assist pressure for direct mechanical ventricular actuation

    NASA Astrophysics Data System (ADS)

    An, Dawei; Yang, Ming; Gu, Xiaotong; Meng, Fan; Yang, Tianyue; Lin, Shujing

    2018-02-01

    Direct mechanical ventricular actuation is effective to reestablish the ventricular function with non-blood contact. Due to the energy loss within the driveline of the direct cardiac compression device, it is necessary to acquire the accurate value of assist pressure acting on the heart surface. To avoid myocardial trauma induced by invasive sensors, the noninvasive estimation method is developed and the experimental device is designed to measure the sample data for fitting the estimation models. By examining the goodness of fit numerically and graphically, the polynomial model presents the best behavior among the four alternative models. Meanwhile, to verify the effect of the noninvasive estimation, the simplified lumped parameter model is utilized to calculate the pre-support and the post-support left ventricular pressure. Furthermore, by adjusting the driving pressure beyond the range of the sample data, the assist pressure is estimated with the similar waveform and the post-support left ventricular pressure approaches the value of the adult healthy heart, indicating the good generalization ability of the noninvasive estimation method.

  3. A Note on Sample Size and Solution Propriety for Confirmatory Factor Analytic Models

    ERIC Educational Resources Information Center

    Jackson, Dennis L.; Voth, Jennifer; Frey, Marc P.

    2013-01-01

    Determining an appropriate sample size for use in latent variable modeling techniques has presented ongoing challenges to researchers. In particular, small sample sizes are known to present concerns over sampling error for the variances and covariances on which model estimation is based, as well as for fit indexes and convergence failures. The…

  4. Fitting N-mixture models to count data with unmodeled heterogeneity: Bias, diagnostics, and alternative approaches

    USGS Publications Warehouse

    Duarte, Adam; Adams, Michael J.; Peterson, James T.

    2018-01-01

    Monitoring animal populations is central to wildlife and fisheries management, and the use of N-mixture models toward these efforts has markedly increased in recent years. Nevertheless, relatively little work has evaluated estimator performance when basic assumptions are violated. Moreover, diagnostics to identify when bias in parameter estimates from N-mixture models is likely is largely unexplored. We simulated count data sets using 837 combinations of detection probability, number of sample units, number of survey occasions, and type and extent of heterogeneity in abundance or detectability. We fit Poisson N-mixture models to these data, quantified the bias associated with each combination, and evaluated if the parametric bootstrap goodness-of-fit (GOF) test can be used to indicate bias in parameter estimates. We also explored if assumption violations can be diagnosed prior to fitting N-mixture models. In doing so, we propose a new model diagnostic, which we term the quasi-coefficient of variation (QCV). N-mixture models performed well when assumptions were met and detection probabilities were moderate (i.e., ≥0.3), and the performance of the estimator improved with increasing survey occasions and sample units. However, the magnitude of bias in estimated mean abundance with even slight amounts of unmodeled heterogeneity was substantial. The parametric bootstrap GOF test did not perform well as a diagnostic for bias in parameter estimates when detectability and sample sizes were low. The results indicate the QCV is useful to diagnose potential bias and that potential bias associated with unidirectional trends in abundance or detectability can be diagnosed using Poisson regression. This study represents the most thorough assessment to date of assumption violations and diagnostics when fitting N-mixture models using the most commonly implemented error distribution. Unbiased estimates of population state variables are needed to properly inform management decision making. Therefore, we also discuss alternative approaches to yield unbiased estimates of population state variables using similar data types, and we stress that there is no substitute for an effective sample design that is grounded upon well-defined management objectives.

  5. Spatial-temporal models for improved county-level annual estimates

    Treesearch

    Francis Roesch

    2009-01-01

    The consumers of data derived from extensive forest inventories often seek annual estimates at a finer spatial scale than that which the inventory was designed to provide. This paper discusses a few model-based and model-assisted estimators to consider for county level attributes that can be applied when the sample would otherwise be inadequate for producing low-...

  6. A method to estimate the fractional fat volume within a ROI of a breast biopsy for WAXS applications: Animal tissue evaluation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Robert Y., E-mail: rx-tang@laurentian.ca; McDonald, Nancy, E-mail: mcdnancye@gmail.com; Laamanen, Curtis, E-mail: cx-laamanen@laurentian.ca

    Purpose: To develop a method to estimate the mean fractional volume of fat (ν{sup ¯}{sub fat}) within a region of interest (ROI) of a tissue sample for wide-angle x-ray scatter (WAXS) applications. A scatter signal from the ROI was obtained and use of ν{sup ¯}{sub fat} in a WAXS fat subtraction model provided a way to estimate the differential linear scattering coefficient μ{sub s} of the remaining fatless tissue. Methods: The efficacy of the method was tested using animal tissue from a local butcher shop. Formalin fixed samples, 5 mm in diameter 4 mm thick, were prepared. The two mainmore » tissue types were fat and meat (fibrous). Pure as well as composite samples consisting of a mixture of the two tissue types were analyzed. For the latter samples, ν{sub fat} for the tissue columns of interest were extracted from corresponding pixels in CCD digital x-ray images using a calibration curve. The means ν{sup ¯}{sub fat} were then calculated for use in a WAXS fat subtraction model. For the WAXS measurements, the samples were interrogated with a 2.7 mm diameter 50 kV beam and the 6° scattered photons were detected with a CdTe detector subtending a solid angle of 7.75 × 10{sup −5} sr. Using the scatter spectrum, an estimate of the incident spectrum, and a scatter model, μ{sub s} was determined for the tissue in the ROI. For the composite samples, a WAXS fat subtraction model was used to estimate the μ{sub s} of the fibrous tissue in the ROI. This signal was compared to μ{sub s} of fibrous tissue obtained using a pure fibrous sample. Results: For chicken and beef composites, ν{sup ¯}{sub fat}=0.33±0.05 and 0.32 ± 0.05, respectively. The subtractions of these fat components from the WAXS composite signals provided estimates of μ{sub s} for chicken and beef fibrous tissue. The differences between the estimates and μ{sub s} of fibrous obtained with a pure sample were calculated as a function of the momentum transfer x. A t-test showed that the mean of the differences did not vary from zero in a statistically significant way thereby validating the methods. Conclusions: The methodology to estimate ν{sup ¯}{sub fat} in a ROI of a tissue sample via CCD x-ray imaging was quantitatively accurate. The WAXS fat subtraction model allowed μ{sub s} of fibrous tissue to be obtained from a ROI which had some fat. The fat estimation method coupled with the WAXS models can be used to compare μ{sub s} coefficients of fibroglandular and cancerous breast tissue.« less

  7. On-line estimation of error covariance parameters for atmospheric data assimilation

    NASA Technical Reports Server (NTRS)

    Dee, Dick P.

    1995-01-01

    A simple scheme is presented for on-line estimation of covariance parameters in statistical data assimilation systems. The scheme is based on a maximum-likelihood approach in which estimates are produced on the basis of a single batch of simultaneous observations. Simple-sample covariance estimation is reasonable as long as the number of available observations exceeds the number of tunable parameters by two or three orders of magnitude. Not much is known at present about model error associated with actual forecast systems. Our scheme can be used to estimate some important statistical model error parameters such as regionally averaged variances or characteristic correlation length scales. The advantage of the single-sample approach is that it does not rely on any assumptions about the temporal behavior of the covariance parameters: time-dependent parameter estimates can be continuously adjusted on the basis of current observations. This is of practical importance since it is likely to be the case that both model error and observation error strongly depend on the actual state of the atmosphere. The single-sample estimation scheme can be incorporated into any four-dimensional statistical data assimilation system that involves explicit calculation of forecast error covariances, including optimal interpolation (OI) and the simplified Kalman filter (SKF). The computational cost of the scheme is high but not prohibitive; on-line estimation of one or two covariance parameters in each analysis box of an operational bozed-OI system is currently feasible. A number of numerical experiments performed with an adaptive SKF and an adaptive version of OI, using a linear two-dimensional shallow-water model and artificially generated model error are described. The performance of the nonadaptive versions of these methods turns out to depend rather strongly on correct specification of model error parameters. These parameters are estimated under a variety of conditions, including uniformly distributed model error and time-dependent model error statistics.

  8. A theoretical signal processing framework for linear diffusion MRI: Implications for parameter estimation and experiment design.

    PubMed

    Varadarajan, Divya; Haldar, Justin P

    2017-11-01

    The data measured in diffusion MRI can be modeled as the Fourier transform of the Ensemble Average Propagator (EAP), a probability distribution that summarizes the molecular diffusion behavior of the spins within each voxel. This Fourier relationship is potentially advantageous because of the extensive theory that has been developed to characterize the sampling requirements, accuracy, and stability of linear Fourier reconstruction methods. However, existing diffusion MRI data sampling and signal estimation methods have largely been developed and tuned without the benefit of such theory, instead relying on approximations, intuition, and extensive empirical evaluation. This paper aims to address this discrepancy by introducing a novel theoretical signal processing framework for diffusion MRI. The new framework can be used to characterize arbitrary linear diffusion estimation methods with arbitrary q-space sampling, and can be used to theoretically evaluate and compare the accuracy, resolution, and noise-resilience of different data acquisition and parameter estimation techniques. The framework is based on the EAP, and makes very limited modeling assumptions. As a result, the approach can even provide new insight into the behavior of model-based linear diffusion estimation methods in contexts where the modeling assumptions are inaccurate. The practical usefulness of the proposed framework is illustrated using both simulated and real diffusion MRI data in applications such as choosing between different parameter estimation methods and choosing between different q-space sampling schemes. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Disentangling sampling and ecological explanations underlying species-area relationships

    USGS Publications Warehouse

    Cam, E.; Nichols, J.D.; Hines, J.E.; Sauer, J.R.; Alpizar-Jara, R.; Flather, C.H.

    2002-01-01

    We used a probabilistic approach to address the influence of sampling artifacts on the form of species-area relationships (SARs). We developed a model in which the increase in observed species richness is a function of sampling effort exclusively. We assumed that effort depends on area sampled, and we generated species-area curves under that model. These curves can be realistic looking. We then generated SARs from avian data, comparing SARs based on counts with those based on richness estimates. We used an approach to estimation of species richness that accounts for species detection probability and, hence, for variation in sampling effort. The slopes of SARs based on counts are steeper than those of curves based on estimates of richness, indicating that the former partly reflect failure to account for species detection probability. SARs based on estimates reflect ecological processes exclusively, not sampling processes. This approach permits investigation of ecologically relevant hypotheses. The slope of SARs is not influenced by the slope of the relationship between habitat diversity and area. In situations in which not all of the species are detected during sampling sessions, approaches to estimation of species richness integrating species detection probability should be used to investigate the rate of increase in species richness with area.

  10. A New Monte Carlo Method for Estimating Marginal Likelihoods.

    PubMed

    Wang, Yu-Bo; Chen, Ming-Hui; Kuo, Lynn; Lewis, Paul O

    2018-06-01

    Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte Carlo sample. This class can be thought of as a generalization of the harmonic mean and inflated density ratio estimators using a partition weighted kernel (likelihood times prior). We show that our estimator is consistent and has better theoretical properties than the harmonic mean and inflated density ratio estimators. In addition, we provide guidelines on choosing optimal weights. Simulation studies were conducted to examine the empirical performance of the proposed estimator. We further demonstrate the desirable features of the proposed estimator with two real data sets: one is from a prostate cancer study using an ordinal probit regression model with latent variables; the other is for the power prior construction from two Eastern Cooperative Oncology Group phase III clinical trials using the cure rate survival model with similar objectives.

  11. Using counts to simultaneously estimate abundance and detection probabilities in a salamander community

    USGS Publications Warehouse

    Dodd, C.K.; Dorazio, R.M.

    2004-01-01

    A critical variable in both ecological and conservation field studies is determining how many individuals of a species are present within a defined sampling area. Labor intensive techniques such as capture-mark-recapture and removal sampling may provide estimates of abundance, but there are many logistical constraints to their widespread application. Many studies on terrestrial and aquatic salamanders use counts as an index of abundance, assuming that detection remains constant while sampling. If this constancy is violated, determination of detection probabilities is critical to the accurate estimation of abundance. Recently, a model was developed that provides a statistical approach that allows abundance and detection to be estimated simultaneously from spatially and temporally replicated counts. We adapted this model to estimate these parameters for salamanders sampled over a six vear period in area-constrained plots in Great Smoky Mountains National Park. Estimates of salamander abundance varied among years, but annual changes in abundance did not vary uniformly among species. Except for one species, abundance estimates were not correlated with site covariates (elevation/soil and water pH, conductivity, air and water temperature). The uncertainty in the estimates was so large as to make correlations ineffectual in predicting which covariates might influence abundance. Detection probabilities also varied among species and sometimes among years for the six species examined. We found such a high degree of variation in our counts and in estimates of detection among species, sites, and years as to cast doubt upon the appropriateness of using count data to monitor population trends using a small number of area-constrained survey plots. Still, the model provided reasonable estimates of abundance that could make it useful in estimating population size from count surveys.

  12. Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

    NASA Technical Reports Server (NTRS)

    Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.

  13. Bias-Corrected Estimation of Noncentrality Parameters of Covariance Structure Models

    ERIC Educational Resources Information Center

    Raykov, Tenko

    2005-01-01

    A bias-corrected estimator of noncentrality parameters of covariance structure models is discussed. The approach represents an application of the bootstrap methodology for purposes of bias correction, and utilizes the relation between average of resample conventional noncentrality parameter estimates and their sample counterpart. The…

  14. Connections between survey calibration estimators and semiparametric models for incomplete data

    PubMed Central

    Lumley, Thomas; Shaw, Pamela A.; Dai, James Y.

    2012-01-01

    Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the ‘estimated weights’ paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models. PMID:23833390

  15. A generalized mixed effects model of abundance for mark-resight data when sampling is without replacement

    USGS Publications Warehouse

    McClintock, B.T.; White, Gary C.; Burnham, K.P.; Pryde, M.A.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.

    2009-01-01

    In recent years, the mark-resight method for estimating abundance when the number of marked individuals is known has become increasingly popular. By using field-readable bands that may be resighted from a distance, these techniques can be applied to many species, and are particularly useful for relatively small, closed populations. However, due to the different assumptions and general rigidity of the available estimators, researchers must often commit to a particular model without rigorous quantitative justification for model selection based on the data. Here we introduce a nonlinear logit-normal mixed effects model addressing this need for a more generalized framework. Similar to models available for mark-recapture studies, the estimator allows a wide variety of sampling conditions to be parameterized efficiently under a robust sampling design. Resighting rates may be modeled simply or with more complexity by including fixed temporal and random individual heterogeneity effects. Using information theory, the model(s) best supported by the data may be selected from the candidate models proposed. Under this generalized framework, we hope the uncertainty associated with mark-resight model selection will be reduced substantially. We compare our model to other mark-resight abundance estimators when applied to mainland New Zealand robin (Petroica australis) data recently collected in Eglinton Valley, Fiordland National Park and summarize its performance in simulation experiments.

  16. Identifiability and Performance Analysis of Output Over-sampling Approach to Direct Closed-loop Identification

    NASA Astrophysics Data System (ADS)

    Sun, Lianming; Sano, Akira

    Output over-sampling based closed-loop identification algorithm is investigated in this paper. Some instinct properties of the continuous stochastic noise and the plant input, output in the over-sampling approach are analyzed, and they are used to demonstrate the identifiability in the over-sampling approach and to evaluate its identification performance. Furthermore, the selection of plant model order, the asymptotic variance of estimated parameters and the asymptotic variance of frequency response of the estimated model are also explored. It shows that the over-sampling approach can guarantee the identifiability and improve the performance of closed-loop identification greatly.

  17. Geostatistical estimation of forest biomass in interior Alaska combining Landsat-derived tree cover, sampled airborne lidar and field observations

    NASA Astrophysics Data System (ADS)

    Babcock, Chad; Finley, Andrew O.; Andersen, Hans-Erik; Pattison, Robert; Cook, Bruce D.; Morton, Douglas C.; Alonzo, Michael; Nelson, Ross; Gregoire, Timothy; Ene, Liviu; Gobakken, Terje; Næsset, Erik

    2018-06-01

    The goal of this research was to develop and examine the performance of a geostatistical coregionalization modeling approach for combining field inventory measurements, strip samples of airborne lidar and Landsat-based remote sensing data products to predict aboveground biomass (AGB) in interior Alaska's Tanana Valley. The proposed modeling strategy facilitates pixel-level mapping of AGB density predictions across the entire spatial domain. Additionally, the coregionalization framework allows for statistically sound estimation of total AGB for arbitrary areal units within the study area---a key advance to support diverse management objectives in interior Alaska. This research focuses on appropriate characterization of prediction uncertainty in the form of posterior predictive coverage intervals and standard deviations. Using the framework detailed here, it is possible to quantify estimation uncertainty for any spatial extent, ranging from pixel-level predictions of AGB density to estimates of AGB stocks for the full domain. The lidar-informed coregionalization models consistently outperformed their counterpart lidar-free models in terms of point-level predictive performance and total AGB precision. Additionally, the inclusion of Landsat-derived forest cover as a covariate further improved estimation precision in regions with lower lidar sampling intensity. Our findings also demonstrate that model-based approaches that do not explicitly account for residual spatial dependence can grossly underestimate uncertainty, resulting in falsely precise estimates of AGB. On the other hand, in a geostatistical setting, residual spatial structure can be modeled within a Bayesian hierarchical framework to obtain statistically defensible assessments of uncertainty for AGB estimates.

  18. Using Data-Dependent Priors to Mitigate Small Sample Bias in Latent Growth Models: A Discussion and Illustration Using M"plus"

    ERIC Educational Resources Information Center

    McNeish, Daniel M.

    2016-01-01

    Mixed-effects models (MEMs) and latent growth models (LGMs) are often considered interchangeable save the discipline-specific nomenclature. Software implementations of these models, however, are not interchangeable, particularly with small sample sizes. Restricted maximum likelihood estimation that mitigates small sample bias in MEMs has not been…

  19. A New Approach of Juvenile Age Estimation using Measurements of the Ilium and Multivariate Adaptive Regression Splines (MARS) Models for Better Age Prediction.

    PubMed

    Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal

    2017-01-01

    Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.

  20. Estimating population size for Capercaillie (Tetrao urogallus L.) with spatial capture-recapture models based on genotypes from one field sample

    USGS Publications Warehouse

    Mollet, Pierre; Kery, Marc; Gardner, Beth; Pasinelli, Gilberto; Royle, Andy

    2015-01-01

    We conducted a survey of an endangered and cryptic forest grouse, the capercaillie Tetrao urogallus, based on droppings collected on two sampling occasions in eight forest fragments in central Switzerland in early spring 2009. We used genetic analyses to sex and individually identify birds. We estimated sex-dependent detection probabilities and population size using a modern spatial capture-recapture (SCR) model for the data from pooled surveys. A total of 127 capercaillie genotypes were identified (77 males, 46 females, and 4 of unknown sex). The SCR model yielded atotal population size estimate (posterior mean) of 137.3 capercaillies (posterior sd 4.2, 95% CRI 130–147). The observed sex ratio was skewed towards males (0.63). The posterior mean of the sex ratio under the SCR model was 0.58 (posterior sd 0.02, 95% CRI 0.54–0.61), suggesting a male-biased sex ratio in our study area. A subsampling simulation study indicated that a reduced sampling effort representing 75% of the actual detections would still yield practically acceptable estimates of total size and sex ratio in our population. Hence, field work and financial effort could be reduced without compromising accuracy when the SCR model is used to estimate key population parameters of cryptic species.

  1. Estimating the "impact" of out-of-home placement on child well-being: approaching the problem of selection bias.

    PubMed

    Berger, Lawrence M; Bruch, Sarah K; Johnson, Elizabeth I; James, Sigrid; Rubin, David

    2009-01-01

    This study used data on 2,453 children aged 4-17 from the National Survey of Child and Adolescent Well-Being and 5 analytic methods that adjust for selection factors to estimate the impact of out-of-home placement on children's cognitive skills and behavior problems. Methods included ordinary least squares (OLS) regressions and residualized change, simple change, difference-in-difference, and fixed effects models. Models were estimated using the full sample and a matched sample generated by propensity scoring. Although results from the unmatched OLS and residualized change models suggested that out-of-home placement is associated with increased child behavior problems, estimates from models that more rigorously adjust for selection bias indicated that placement has little effect on children's cognitive skills or behavior problems.

  2. Effect of non-Poisson samples on turbulence spectra from laser velocimetry

    NASA Technical Reports Server (NTRS)

    Sree, Dave; Kjelgaard, Scott O.; Sellers, William L., III

    1994-01-01

    Spectral analysis of laser velocimetry (LV) data plays an important role in characterizing a turbulent flow and in estimating the associated turbulence scales, which can be helpful in validating theoretical and numerical turbulence models. The determination of turbulence scales is critically dependent on the accuracy of the spectral estimates. Spectral estimations from 'individual realization' laser velocimetry data are typically based on the assumption of a Poisson sampling process. What this Note has demonstrated is that the sampling distribution must be considered before spectral estimates are used to infer turbulence scales.

  3. Use of spatial capture-recapture modeling and DNA data to estimate densities of elusive animals

    USGS Publications Warehouse

    Kery, Marc; Gardner, Beth; Stoeckle, Tabea; Weber, Darius; Royle, J. Andrew

    2011-01-01

    Assessment of abundance, survival, recruitment rates, and density (i.e., population assessment) is especially challenging for elusive species most in need of protection (e.g., rare carnivores). Individual identification methods, such as DNA sampling, provide ways of studying such species efficiently and noninvasively. Additionally, statistical methods that correct for undetected animals and account for locations where animals are captured are available to efficiently estimate density and other demographic parameters. We collected hair samples of European wildcat (Felis silvestris) from cheek-rub lure sticks, extracted DNA from the samples, and identified each animals' genotype. To estimate the density of wildcats, we used Bayesian inference in a spatial capture-recapture model. We used WinBUGS to fit a model that accounted for differences in detection probability among individuals and seasons and between two lure arrays. We detected 21 individual wildcats (including possible hybrids) 47 times. Wildcat density was estimated at 0.29/km2 (SE 0.06), and 95% of the activity of wildcats was estimated to occur within 1.83 km from their home-range center. Lures located systematically were associated with a greater number of detections than lures placed in a cell on the basis of expert opinion. Detection probability of individual cats was greatest in late March. Our model is a generalized linear mixed model; hence, it can be easily extended, for instance, to incorporate trap- and individual-level covariates. We believe that the combined use of noninvasive sampling techniques and spatial capture-recapture models will improve population assessments, especially for rare and elusive animals.

  4. Per-pixel bias-variance decomposition of continuous errors in data-driven geospatial modeling: A case study in environmental remote sensing

    NASA Astrophysics Data System (ADS)

    Gao, Jing; Burt, James E.

    2017-12-01

    This study investigates the usefulness of a per-pixel bias-variance error decomposition (BVD) for understanding and improving spatially-explicit data-driven models of continuous variables in environmental remote sensing (ERS). BVD is a model evaluation method originated from machine learning and have not been examined for ERS applications. Demonstrated with a showcase regression tree model mapping land imperviousness (0-100%) using Landsat images, our results showed that BVD can reveal sources of estimation errors, map how these sources vary across space, reveal the effects of various model characteristics on estimation accuracy, and enable in-depth comparison of different error metrics. Specifically, BVD bias maps can help analysts identify and delineate model spatial non-stationarity; BVD variance maps can indicate potential effects of ensemble methods (e.g. bagging), and inform efficient training sample allocation - training samples should capture the full complexity of the modeled process, and more samples should be allocated to regions with more complex underlying processes rather than regions covering larger areas. Through examining the relationships between model characteristics and their effects on estimation accuracy revealed by BVD for both absolute and squared errors (i.e. error is the absolute or the squared value of the difference between observation and estimate), we found that the two error metrics embody different diagnostic emphases, can lead to different conclusions about the same model, and may suggest different solutions for performance improvement. We emphasize BVD's strength in revealing the connection between model characteristics and estimation accuracy, as understanding this relationship empowers analysts to effectively steer performance through model adjustments.

  5. [Stature estimation for Sichuan Han nationality female based on X-ray technology with measurement of lumbar vertebrae].

    PubMed

    Qing, Si-han; Chang, Yun-feng; Dong, Xiao-ai; Li, Yuan; Chen, Xiao-gang; Shu, Yong-kang; Deng, Zhen-hua

    2013-10-01

    To establish the mathematical models of stature estimation for Sichuan Han female with measurement of lumbar vertebrae by X-ray to provide essential data for forensic anthropology research. The samples, 206 Sichuan Han females, were divided into three groups including group A, B and C according to the ages. Group A (206 samples) consisted of all ages, group B (116 samples) were 20-45 years old and 90 samples over 45 years old were group C. All the samples were examined lumbar vertebrae through CR technology, including the parameters of five centrums (L1-L5) as anterior border, posterior border and central heights (x1-x15), total central height of lumbar spine (x16), and the real height of every sample. The linear regression analysis was produced using the parameters to establish the mathematical models of stature estimation. Sixty-two trained subjects were tested to verify the accuracy of the mathematical models. The established mathematical models by hypothesis test of linear regression equation model were statistically significant (P<0.05). The standard errors of the equation were 2.982-5.004 cm, while correlation coefficients were 0.370-0.779 and multiple correlation coefficients were 0.533-0.834. The return tests of the highest correlation coefficient and multiple correlation coefficient of each group showed that the highest accuracy of the multiple regression equation, y = 100.33 + 1.489 x3 - 0.548 x6 + 0.772 x9 + 0.058 x12 + 0.645 x15, in group A were 80.6% (+/- lSE) and 100% (+/- 2SE). The established mathematical models in this study could be applied for the stature estimation for Sichuan Han females.

  6. Improving inference for aerial surveys of bears: The importance of assumptions and the cost of unnecessary complexity.

    PubMed

    Schmidt, Joshua H; Wilson, Tammy L; Thompson, William L; Reynolds, Joel H

    2017-07-01

    Obtaining useful estimates of wildlife abundance or density requires thoughtful attention to potential sources of bias and precision, and it is widely understood that addressing incomplete detection is critical to appropriate inference. When the underlying assumptions of sampling approaches are violated, both increased bias and reduced precision of the population estimator may result. Bear ( Ursus spp.) populations can be difficult to sample and are often monitored using mark-recapture distance sampling (MRDS) methods, although obtaining adequate sample sizes can be cost prohibitive. With the goal of improving inference, we examined the underlying methodological assumptions and estimator efficiency of three datasets collected under an MRDS protocol designed specifically for bears. We analyzed these data using MRDS, conventional distance sampling (CDS), and open-distance sampling approaches to evaluate the apparent bias-precision tradeoff relative to the assumptions inherent under each approach. We also evaluated the incorporation of informative priors on detection parameters within a Bayesian context. We found that the CDS estimator had low apparent bias and was more efficient than the more complex MRDS estimator. When combined with informative priors on the detection process, precision was increased by >50% compared to the MRDS approach with little apparent bias. In addition, open-distance sampling models revealed a serious violation of the assumption that all bears were available to be sampled. Inference is directly related to the underlying assumptions of the survey design and the analytical tools employed. We show that for aerial surveys of bears, avoidance of unnecessary model complexity, use of prior information, and the application of open population models can be used to greatly improve estimator performance and simplify field protocols. Although we focused on distance sampling-based aerial surveys for bears, the general concepts we addressed apply to a variety of wildlife survey contexts.

  7. Stochastic Residual-Error Analysis For Estimating Hydrologic Model Predictive Uncertainty

    EPA Science Inventory

    A hybrid time series-nonparametric sampling approach, referred to herein as semiparametric, is presented for the estimation of model predictive uncertainty. The methodology is a two-step procedure whereby a distributed hydrologic model is first calibrated, then followed by brute ...

  8. Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas

    PubMed Central

    Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian

    2016-01-01

    The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible. PMID:27367708

  9. Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas.

    PubMed

    Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian

    2016-06-28

    The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R² of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R² between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R² value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R² and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.

  10. Cost-effective sampling of ¹³⁷Cs-derived net soil redistribution: part 1--estimating the spatial mean across scales of variation.

    PubMed

    Li, Y; Chappell, A; Nyamdavaa, B; Yu, H; Davaasuren, D; Zoljargal, K

    2015-03-01

    The (137)Cs technique for estimating net time-integrated soil redistribution is valuable for understanding the factors controlling soil redistribution by all processes. The literature on this technique is dominated by studies of individual fields and describes its typically time-consuming nature. We contend that the community making these studies has inappropriately assumed that many (137)Cs measurements are required and hence estimates of net soil redistribution can only be made at the field scale. Here, we support future studies of (137)Cs-derived net soil redistribution to apply their often limited resources across scales of variation (field, catchment, region etc.) without compromising the quality of the estimates at any scale. We describe a hybrid, design-based and model-based, stratified random sampling design with composites to estimate the sampling variance and a cost model for fieldwork and laboratory measurements. Geostatistical mapping of net (1954-2012) soil redistribution as a case study on the Chinese Loess Plateau is compared with estimates for several other sampling designs popular in the literature. We demonstrate the cost-effectiveness of the hybrid design for spatial estimation of net soil redistribution. To demonstrate the limitations of current sampling approaches to cut across scales of variation, we extrapolate our estimate of net soil redistribution across the region, show that for the same resources, estimates from many fields could have been provided and would elucidate the cause of differences within and between regional estimates. We recommend that future studies evaluate carefully the sampling design to consider the opportunity to investigate (137)Cs-derived net soil redistribution across scales of variation. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Estimating the encounter rate variance in distance sampling

    USGS Publications Warehouse

    Fewster, R.M.; Buckland, S.T.; Burnham, K.P.; Borchers, D.L.; Jupp, P.E.; Laake, J.L.; Thomas, L.

    2009-01-01

    The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. ?? 2008, The International Biometric Society.

  12. A method for the estimation of the significance of cross-correlations in unevenly sampled red-noise time series

    NASA Astrophysics Data System (ADS)

    Max-Moerbeck, W.; Richards, J. L.; Hovatta, T.; Pavlidou, V.; Pearson, T. J.; Readhead, A. C. S.

    2014-11-01

    We present a practical implementation of a Monte Carlo method to estimate the significance of cross-correlations in unevenly sampled time series of data, whose statistical properties are modelled with a simple power-law power spectral density. This implementation builds on published methods; we introduce a number of improvements in the normalization of the cross-correlation function estimate and a bootstrap method for estimating the significance of the cross-correlations. A closely related matter is the estimation of a model for the light curves, which is critical for the significance estimates. We present a graphical and quantitative demonstration that uses simulations to show how common it is to get high cross-correlations for unrelated light curves with steep power spectral densities. This demonstration highlights the dangers of interpreting them as signs of a physical connection. We show that by using interpolation and the Hanning sampling window function we are able to reduce the effects of red-noise leakage and to recover steep simple power-law power spectral densities. We also introduce the use of a Neyman construction for the estimation of the errors in the power-law index of the power spectral density. This method provides a consistent way to estimate the significance of cross-correlations in unevenly sampled time series of data.

  13. Sample Size Determination for Regression Models Using Monte Carlo Methods in R

    ERIC Educational Resources Information Center

    Beaujean, A. Alexander

    2014-01-01

    A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…

  14. A Comparison of Normal and Elliptical Estimation Methods in Structural Equation Models.

    ERIC Educational Resources Information Center

    Schumacker, Randall E.; Cheevatanarak, Suchittra

    Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean square error of approximation values using normal and elliptical estimation methods. Three research conditions were imposed on the simulated data: sample size, population contamination percent, and kurtosis. A Bentler-Weeks structural model established the…

  15. Red-shouldered hawk occupancy surveys in central Minnesota, USA

    USGS Publications Warehouse

    Henneman, C.; McLeod, M.A.; Andersen, D.E.

    2007-01-01

    Forest-dwelling raptors are often difficult to detect because many species occur at low density or are secretive. Broadcasting conspecific vocalizations can increase the probability of detecting forest-dwelling raptors and has been shown to be an effective method for locating raptors and assessing their relative abundance. Recent advances in statistical techniques based on presence-absence data use probabilistic arguments to derive probability of detection when it is <1 and to provide a model and likelihood-based method for estimating proportion of sites occupied. We used these maximum-likelihood models with data from red-shouldered hawk (Buteo lineatus) call-broadcast surveys conducted in central Minnesota, USA, in 1994-1995 and 2004-2005. Our objectives were to obtain estimates of occupancy and detection probability 1) over multiple sampling seasons (yr), 2) incorporating within-season time-specific detection probabilities, 3) with call type and breeding stage included as covariates in models of probability of detection, and 4) with different sampling strategies. We visited individual survey locations 2-9 times per year, and estimates of both probability of detection (range = 0.28-0.54) and site occupancy (range = 0.81-0.97) varied among years. Detection probability was affected by inclusion of a within-season time-specific covariate, call type, and breeding stage. In 2004 and 2005 we used survey results to assess the effect that number of sample locations, double sampling, and discontinued sampling had on parameter estimates. We found that estimates of probability of detection and proportion of sites occupied were similar across different sampling strategies, and we suggest ways to reduce sampling effort in a monitoring program.

  16. A comparison of abundance estimates from extended batch-marking and Jolly–Seber-type experiments

    PubMed Central

    Cowen, Laura L E; Besbeas, Panagiotis; Morgan, Byron J T; Schwarz, Carl J

    2014-01-01

    Little attention has been paid to the use of multi-sample batch-marking studies, as it is generally assumed that an individual's capture history is necessary for fully efficient estimates. However, recently, Huggins et al. (2010) present a pseudo-likelihood for a multi-sample batch-marking study where they used estimating equations to solve for survival and capture probabilities and then derived abundance estimates using a Horvitz–Thompson-type estimator. We have developed and maximized the likelihood for batch-marking studies. We use data simulated from a Jolly–Seber-type study and convert this to what would have been obtained from an extended batch-marking study. We compare our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz (CMAS) model with those of the extended batch-marking model to determine the efficiency of collecting and analyzing batch-marking data. We found that estimates of abundance were similar for all three estimators: CMAS, Huggins, and our likelihood. Gains are made when using unique identifiers and employing the CMAS model in terms of precision; however, the likelihood typically had lower mean square error than the pseudo-likelihood method of Huggins et al. (2010). When faced with designing a batch-marking study, researchers can be confident in obtaining unbiased abundance estimators. Furthermore, they can design studies in order to reduce mean square error by manipulating capture probabilities and sample size. PMID:24558576

  17. On the importance of incorporating sampling weights in ...

    EPA Pesticide Factsheets

    Occupancy models are used extensively to assess wildlife-habitat associations and to predict species distributions across large geographic regions. Occupancy models were developed as a tool to properly account for imperfect detection of a species. Current guidelines on survey design requirements for occupancy models focus on the number of sample units and the pattern of revisits to a sample unit within a season. We focus on the sampling design or how the sample units are selected in geographic space (e.g., stratified, simple random, unequal probability, etc). In a probability design, each sample unit has a sample weight which quantifies the number of sample units it represents in the finite (oftentimes areal) sampling frame. We demonstrate the importance of including sampling weights in occupancy model estimation when the design is not a simple random sample or equal probability design. We assume a finite areal sampling frame as proposed for a national bat monitoring program. We compare several unequal and equal probability designs and varying sampling intensity within a simulation study. We found the traditional single season occupancy model produced biased estimates of occupancy and lower confidence interval coverage rates compared to occupancy models that accounted for the sampling design. We also discuss how our findings inform the analyses proposed for the nascent North American Bat Monitoring Program and other collaborative synthesis efforts that propose h

  18. Modeling abundance effects in distance sampling

    USGS Publications Warehouse

    Royle, J. Andrew; Dawson, D.K.; Bates, S.

    2004-01-01

    Distance-sampling methods are commonly used in studies of animal populations to estimate population density. A common objective of such studies is to evaluate the relationship between abundance or density and covariates that describe animal habitat or other environmental influences. However, little attention has been focused on methods of modeling abundance covariate effects in conventional distance-sampling models. In this paper we propose a distance-sampling model that accommodates covariate effects on abundance. The model is based on specification of the distance-sampling likelihood at the level of the sample unit in terms of local abundance (for each sampling unit). This model is augmented with a Poisson regression model for local abundance that is parameterized in terms of available covariates. Maximum-likelihood estimation of detection and density parameters is based on the integrated likelihood, wherein local abundance is removed from the likelihood by integration. We provide an example using avian point-transect data of Ovenbirds (Seiurus aurocapillus) collected using a distance-sampling protocol and two measures of habitat structure (understory cover and basal area of overstory trees). The model yields a sensible description (positive effect of understory cover, negative effect on basal area) of the relationship between habitat and Ovenbird density that can be used to evaluate the effects of habitat management on Ovenbird populations.

  19. HIV Model Parameter Estimates from Interruption Trial Data including Drug Efficacy and Reservoir Dynamics

    PubMed Central

    Luo, Rutao; Piovoso, Michael J.; Martinez-Picado, Javier; Zurakowski, Ryan

    2012-01-01

    Mathematical models based on ordinary differential equations (ODE) have had significant impact on understanding HIV disease dynamics and optimizing patient treatment. A model that characterizes the essential disease dynamics can be used for prediction only if the model parameters are identifiable from clinical data. Most previous parameter identification studies for HIV have used sparsely sampled data from the decay phase following the introduction of therapy. In this paper, model parameters are identified from frequently sampled viral-load data taken from ten patients enrolled in the previously published AutoVac HAART interruption study, providing between 69 and 114 viral load measurements from 3–5 phases of viral decay and rebound for each patient. This dataset is considerably larger than those used in previously published parameter estimation studies. Furthermore, the measurements come from two separate experimental conditions, which allows for the direct estimation of drug efficacy and reservoir contribution rates, two parameters that cannot be identified from decay-phase data alone. A Markov-Chain Monte-Carlo method is used to estimate the model parameter values, with initial estimates obtained using nonlinear least-squares methods. The posterior distributions of the parameter estimates are reported and compared for all patients. PMID:22815727

  20. Assessing tiger population dynamics using photographic capture-recapture sampling

    USGS Publications Warehouse

    Karanth, K.U.; Nichols, J.D.; Kumar, N.S.; Hines, J.E.

    2006-01-01

    Although wide-ranging, elusive, large carnivore species, such as the tiger, are of scientific and conservation interest, rigorous inferences about their population dynamics are scarce because of methodological problems of sampling populations at the required spatial and temporal scales. We report the application of a rigorous, noninvasive method for assessing tiger population dynamics to test model-based predictions about population viability. We obtained photographic capture histories for 74 individual tigers during a nine-year study involving 5725 trap-nights of effort. These data were modeled under a likelihood-based, ?robust design? capture?recapture analytic framework. We explicitly modeled and estimated ecological parameters such as time-specific abundance, density, survival, recruitment, temporary emigration, and transience, using models that incorporated effects of factors such as individual heterogeneity, trap-response, and time on probabilities of photo-capturing tigers. The model estimated a random temporary emigration parameter of =K' =Y' 0.10 ? 0.069 (values are estimated mean ? SE). When scaled to an annual basis, tiger survival rates were estimated at S = 0.77 ? 0.051, and the estimated probability that a newly caught animal was a transient was = 0.18 ? 0.11. During the period when the sampled area was of constant size, the estimated population size Nt varied from 17 ? 1.7 to 31 ? 2.1 tigers, with a geometric mean rate of annual population change estimated as = 1.03 ? 0.020, representing a 3% annual increase. The estimated recruitment of new animals, Bt, varied from 0 ? 3.0 to 14 ? 2.9 tigers. Population density estimates, D, ranged from 7.33 ? 0.8 tigers/100 km2 to 21.73 ? 1.7 tigers/100 km2 during the study. Thus, despite substantial annual losses and temporal variation in recruitment, the tiger density remained at relatively high levels in Nagarahole. Our results are consistent with the hypothesis that protected wild tiger populations can remain healthy despite heavy mortalities because of their inherently high reproductive potential. The ability to model the entire photographic capture history data set and incorporate reduced-parameter models led to estimates of mean annual population change that were sufficiently precise to be useful. This efficient, noninvasive sampling approach can be used to rigorously investigate the population dynamics of tigers and other elusive, rare, wide-ranging animal species in which individuals can be identified from photographs or other means.

  1. Assessing tiger population dynamics using photographic capture-recapture sampling.

    PubMed

    Karanth, K Ullas; Nichols, James D; Kumar, N Samba; Hines, James E

    2006-11-01

    Although wide-ranging, elusive, large carnivore species, such as the tiger, are of scientific and conservation interest, rigorous inferences about their population dynamics are scarce because of methodological problems of sampling populations at the required spatial and temporal scales. We report the application of a rigorous, noninvasive method for assessing tiger population dynamics to test model-based predictions about population viability. We obtained photographic capture histories for 74 individual tigers during a nine-year study involving 5725 trap-nights of effort. These data were modeled under a likelihood-based, "robust design" capture-recapture analytic framework. We explicitly modeled and estimated ecological parameters such as time-specific abundance, density, survival, recruitment, temporary emigration, and transience, using models that incorporated effects of factors such as individual heterogeneity, trap-response, and time on probabilities of photo-capturing tigers. The model estimated a random temporary emigration parameter of gamma" = gamma' = 0.10 +/- 0.069 (values are estimated mean +/- SE). When scaled to an annual basis, tiger survival rates were estimated at S = 0.77 +/- 0.051, and the estimated probability that a newly caught animal was a transient was tau = 0.18 +/- 0.11. During the period when the sampled area was of constant size, the estimated population size N(t) varied from 17 +/- 1.7 to 31 +/- 2.1 tigers, with a geometric mean rate of annual population change estimated as lambda = 1.03 +/- 0.020, representing a 3% annual increase. The estimated recruitment of new animals, B(t), varied from 0 +/- 3.0 to 14 +/- 2.9 tigers. Population density estimates, D, ranged from 7.33 +/- 0.8 tigers/100 km2 to 21.73 +/- 1.7 tigers/100 km2 during the study. Thus, despite substantial annual losses and temporal variation in recruitment, the tiger density remained at relatively high levels in Nagarahole. Our results are consistent with the hypothesis that protected wild tiger populations can remain healthy despite heavy mortalities because of their inherently high reproductive potential. The ability to model the entire photographic capture history data set and incorporate reduced-parameter models led to estimates of mean annual population change that were sufficiently precise to be useful. This efficient, noninvasive sampling approach can be used to rigorously investigate the population dynamics of tigers and other elusive, rare, wide-ranging animal species in which individuals can be identified from photographs or other means.

  2. Application of the Zero-Order Reaction Rate Model and Transition State Theory to predict porous Ti6Al4V bending strength.

    PubMed

    Reig, L; Amigó, V; Busquets, D; Calero, J A; Ortiz, J L

    2012-08-01

    Porous Ti6Al4V samples were produced by microsphere sintering. The Zero-Order Reaction Rate Model and Transition State Theory were used to model the sintering process and to estimate the bending strength of the porous samples developed. The evolution of the surface area during the sintering process was used to obtain sintering parameters (sintering constant, activation energy, frequency factor, constant of activation and Gibbs energy of activation). These were then correlated with the bending strength in order to obtain a simple model with which to estimate the evolution of the bending strength of the samples when the sintering temperature and time are modified: σY=P+B·[lnT·t-ΔGa/R·T]. Although the sintering parameters were obtained only for the microsphere sizes analysed here, the strength of intermediate sizes could easily be estimated following this model. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression scores compared to independent item response theory model reestimation.

    PubMed

    Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix

    2016-03-01

    To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Period Estimation for Sparsely-sampled Quasi-periodic Light Curves Applied to Miras

    NASA Astrophysics Data System (ADS)

    He, Shiyuan; Yuan, Wenlong; Huang, Jianhua Z.; Long, James; Macri, Lucas M.

    2016-12-01

    We develop a nonlinear semi-parametric Gaussian process model to estimate periods of Miras with sparsely sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in the model with respect to a suitable prior distribution obtained from earlier studies. Since the likelihood is highly multimodal for period, we implement a hybrid method that applies the quasi-Newton algorithm for Gaussian process parameters and search the period/frequency parameter space over a dense grid. A large-scale, high-fidelity simulation is conducted to mimic the sampling quality of Mira light curves obtained by the M33 Synoptic Stellar Survey. The simulated data set is publicly available and can serve as a testbed for future evaluation of different period estimation methods. The semi-parametric model outperforms an existing algorithm on this simulated test data set as measured by period recovery rate and quality of the resulting period-luminosity relations.

  5. Tensor-guided fitting of subduction slab depths

    USGS Publications Warehouse

    Bazargani, Farhad; Hayes, Gavin P.

    2013-01-01

    Geophysical measurements are often acquired at scattered locations in space. Therefore, interpolating or fitting the sparsely sampled data as a uniform function of space (a procedure commonly known as gridding) is a ubiquitous problem in geophysics. Most gridding methods require a model of spatial correlation for data. This spatial correlation model can often be inferred from some sort of secondary information, which may also be sparsely sampled in space. In this paper, we present a new method to model the geometry of a subducting slab in which we use a data‐fitting approach to address the problem. Earthquakes and active‐source seismic surveys provide estimates of depths of subducting slabs but only at scattered locations. In addition to estimates of depths from earthquake locations, focal mechanisms of subduction zone earthquakes also provide estimates of the strikes of the subducting slab on which they occur. We use these spatially sparse strike samples and the Earth’s curved surface geometry to infer a model for spatial correlation that guides a blended neighbor interpolation of slab depths. We then modify the interpolation method to account for the uncertainties associated with the depth estimates.

  6. Eastern Baltic region vs. Western Europe: modelling age related changes in the pubic symphysis and the auricular surface.

    PubMed

    Jatautis, Šarūnas; Jankauskas, Rimantas

    2018-02-01

    Objectives. The present study addresses the following two main questions: a) Is the pattern of skeletal ageing observed in well-known western European reference collections applicable to modern eastern Baltic populations, or are population-specific standards needed? b) What are the consequences for estimating the age-at-death distribution in the target population when differences in the estimates from reference data are not taken into account? Materials and methods. The dataset consists of a modern Lithuanian osteological reference collection, which is the only collection of this type in the eastern Baltic countries (n = 381); and two major western European reference collections, Coimbra (n = 264) and Spitalfields (n = 239). The age-related changes were evaluated using the scoring systems of Suchey-Brooks (Brooks & Suchey 1990) and Lovejoy et al. (1985), and were modelled via regression models for multinomial responses. A controlled experiment based on simulations and the Rostock Manifesto estimation protocol (Wood et al. 2002) was then carried out to assess the effect of using estimates from different reference samples and different regression models on estimates of the age-at-death distribution in the hypothetical target population. Results. The following key results were obtained in this study. a) The morphological alterations in the pubic symphysis were much faster among women than among men at comparable ages in all three reference samples. In contrast, we found no strong evidence in any of the reference samples that sex is an important factor to explain rate of changes in the auricular surface. b) The rate of ageing in the pubic symphysis seems to be similar across the three reference samples, but there is little evidence of a similar pattern in the auricular surface. That is, the estimated rate of age-related changes in the auricular surface was much faster in the LORC and the Coimbra samples than in the Spitalfields sample. c) The results of simulations showed that the differences in the estimates from the reference data result in noticeably different age-at-death distributions in the target population. Thus, a degree bias may be expected if estimates from the western European reference data are used to collect information on ages at death in the eastern Baltic region based on the changes in the auricular surface. d) Moreover, the bias is expected to be more pronounced if the fitted regression model improperly describes the reference data. Conclusions. Differences in the timing of age-related changes in skeletal traits are to be expected among European reference samples, and cannot be ignored when seeking to reliably estimate an age-at-death distribution in the target population. This form of bias should be taken into consideration in further studies of skeletal samples from the eastern Baltic region.

  7. Taking the Missing Propensity Into Account When Estimating Competence Scores

    PubMed Central

    Pohl, Steffi; Carstensen, Claus H.

    2014-01-01

    When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically made when using these models: (1) The missing propensity is unidimensional and (2) the missing propensity and the ability are bivariate normally distributed. These assumptions may, however, be violated in real data sets and could, thus, pose a threat to the validity of this approach. The present study focuses on modeling competencies in various domains, using data from a school sample (N = 15,396) and an adult sample (N = 7,256) from the National Educational Panel Study. Our interest was to investigate whether violations of unidimensionality and the normal distribution assumption severely affect the performance of the model-based approach in terms of differences in ability estimates. We propose a model with a competence dimension, a unidimensional missing propensity and a distributional assumption more flexible than a multivariate normal. Using this model for ability estimation results in different ability estimates compared with a model ignoring missing responses. Implications for ability estimation in large-scale assessments are discussed. PMID:29795844

  8. And the first one now will later be last: Time-reversal in cormack-jolly-seber models

    USGS Publications Warehouse

    Nichols, James D.

    2016-01-01

    The models of Cormack, Jolly and Seber (CJS) are remarkable in providing a rich set of inferences about population survival, recruitment, abundance and even sampling probabilities from a seemingly limited data source: a matrix of 1's and 0's reflecting animal captures and recaptures at multiple sampling occasions. Survival and sampling probabilities are estimated directly in CJS models, whereas estimators for recruitment and abundance were initially obtained as derived quantities. Various investigators have noted that just as standard modeling provides direct inferences about survival, reversing the time order of capture history data permits direct modeling and inference about recruitment. Here we review the development of reverse-time modeling efforts, emphasizing the kinds of inferences and questions to which they seem well suited.

  9. New segregation analysis of panic disorder

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vieland, V.J.; Fyer, A.J.; Chapman, T.

    1996-04-09

    We performed simple segregation analyses of panic disorder using 126 families of probands with DSM-III-R panic disorder who were ascertained for a family study of anxiety disorders at an anxiety disorders research clinic. We present parameter estimates for dominant, recessive, and arbitrary single major locus models without sex effects, as well as for a nongenetic transmission model, and compare these models to each other and to models obtained by other investigators. We rejected the nongenetic transmission model when comparing it to the recessive model. Consistent with some previous reports, we find comparable support for dominant and recessive models, and inmore » both cases estimate nonzero phenocopy rates. The effect of restricting the analysis to families of probands without any lifetime history of comorbid major depression (MDD) was also examined. No notable differences in parameter estimates were found in that subsample, although the power of that analysis was low. Consistency between the findings in our sample and in another independently collected sample suggests the possibility of pooling such samples in the future in order to achieve the necessary power for more complex analyses. 32 refs., 4 tabs.« less

  10. Estimating open population site occupancy from presence-absence data lacking the robust design.

    PubMed

    Dail, D; Madsen, L

    2013-03-01

    Many animal monitoring studies seek to estimate the proportion of a study area occupied by a target population. The study area is divided into spatially distinct sites where the detected presence or absence of the population is recorded, and this is repeated in time for multiple seasons. However, when occupied sites are detected with probability p < 1, the lack of a detection does not imply lack of occupancy. MacKenzie et al. (2003, Ecology 84, 2200-2207) developed a multiseason model for estimating seasonal site occupancy (ψt ) while accounting for unknown p. Their model performs well when observations are collected according to the robust design, where multiple sampling occasions occur during each season; the repeated sampling aids in the estimation p. However, their model does not perform as well when the robust design is lacking. In this paper, we propose an alternative likelihood model that yields improved seasonal estimates of p and Ψt in the absence of the robust design. We construct the marginal likelihood of the observed data by conditioning on, and summing out, the latent number of occupied sites during each season. A simulation study shows that in cases without the robust design, the proposed model estimates p with less bias than the MacKenzie et al. model and hence improves the estimates of Ψt . We apply both models to a data set consisting of repeated presence-absence observations of American robins (Turdus migratorius) with yearly survey periods. The two models are compared to a third estimator available when the repeated counts (from the same study) are considered, with the proposed model yielding estimates of Ψt closest to estimates from the point count model. Copyright © 2013, The International Biometric Society.

  11. A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

    PubMed Central

    Jacob, Benjamin G; Griffith, Daniel A; Muturi, Ephantus J; Caamano, Erick X; Githure, John I; Novak, Robert J

    2009-01-01

    Background Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix. Results By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat. Conclusion An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity. PMID:19772590

  12. TH-EF-BRA-08: A Novel Technique for Estimating Volumetric Cine MRI (VC-MRI) From Multi-Slice Sparsely Sampled Cine Images Using Motion Modeling and Free Form Deformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, W; Yin, F; Wang, C

    Purpose: To develop a technique to estimate on-board VC-MRI using multi-slice sparsely-sampled cine images, patient prior 4D-MRI, motion-modeling and free-form deformation for real-time 3D target verification of lung radiotherapy. Methods: A previous method has been developed to generate on-board VC-MRI by deforming prior MRI images based on a motion model(MM) extracted from prior 4D-MRI and a single-slice on-board 2D-cine image. In this study, free-form deformation(FD) was introduced to correct for errors in the MM when large anatomical changes exist. Multiple-slice sparsely-sampled on-board 2D-cine images located within the target are used to improve both the estimation accuracy and temporal resolution ofmore » VC-MRI. The on-board 2D-cine MRIs are acquired at 20–30frames/s by sampling only 10% of the k-space on Cartesian grid, with 85% of that taken at the central k-space. The method was evaluated using XCAT(computerized patient model) simulation of lung cancer patients with various anatomical and respirational changes from prior 4D-MRI to onboard volume. The accuracy was evaluated using Volume-Percent-Difference(VPD) and Center-of-Mass-Shift(COMS) of the estimated tumor volume. Effects of region-of-interest(ROI) selection, 2D-cine slice orientation, slice number and slice location on the estimation accuracy were evaluated. Results: VCMRI estimated using 10 sparsely-sampled sagittal 2D-cine MRIs achieved VPD/COMS of 9.07±3.54%/0.45±0.53mm among all scenarios based on estimation with ROI-MM-ROI-FD. The FD optimization improved estimation significantly for scenarios with anatomical changes. Using ROI-FD achieved better estimation than global-FD. Changing the multi-slice orientation to axial, coronal, and axial/sagittal orthogonal reduced the accuracy of VCMRI to VPD/COMS of 19.47±15.74%/1.57±2.54mm, 20.70±9.97%/2.34±0.92mm, and 16.02±13.79%/0.60±0.82mm, respectively. Reducing the number of cines to 8 enhanced temporal resolution of VC-MRI by 25% while maintaining the estimation accuracy. Estimation using slices sampled uniformly through the tumor achieved better accuracy than slices sampled non-uniformly. Conclusions: Preliminary studies showed that it is feasible to generate VC-MRI from multi-slice sparsely-sampled 2D-cine images for real-time 3D-target verification. This work was supported by the National Institutes of Health under Grant No. R01-CA184173 and a research grant from Varian Medical Systems.« less

  13. RESPONDENT-DRIVEN SAMPLING AS MARKOV CHAIN MONTE CARLO

    PubMed Central

    GOEL, SHARAD; SALGANIK, MATTHEW J.

    2013-01-01

    Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present respondent-driven sampling as Markov chain Monte Carlo (MCMC) importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating respondent-driven sampling studies. PMID:19572381

  14. Estimation of mortality for stage-structured zooplankton populations: What is to be done?

    NASA Astrophysics Data System (ADS)

    Ohman, Mark D.

    2012-05-01

    Estimation of zooplankton mortality rates in field populations is a challenging task that some contend is inherently intractable. This paper examines several of the objections that are commonly raised to efforts to estimate mortality. We find that there are circumstances in the field where it is possible to sequentially sample the same population and to resolve biologically caused mortality, albeit with error. Precision can be improved with sampling directed by knowledge of the physical structure of the water column, combined with adequate sample replication. Intercalibration of sampling methods can make it possible to sample across the life history in a quantitative manner. Rates of development can be constrained by laboratory-based estimates of stage durations from temperature- and food-dependent functions, mesocosm studies of molting rates, or approximation of development rates from growth rates, combined with the vertical distributions of organisms in relation to food and temperature gradients. Careful design of field studies guided by the assumptions of specific estimation models can lead to satisfactory mortality estimates, but model uncertainty also needs to be quantified. We highlight additional issues requiring attention to further advance the field, including the need for linked cooperative studies of the rates and causes of mortality of co-occurring holozooplankton and ichthyoplankton.

  15. Compatible estimators of the components of change for a rotating panel forest inventory design

    Treesearch

    Francis A. Roesch

    2007-01-01

    This article presents two approaches for estimating the components of forest change utilizing data from a rotating panel sample design. One approach uses a variant of the exponentially weighted moving average estimator and the other approach uses mixed estimation. Three general transition models were each combined with a single compatibility model for the mixed...

  16. On the choice of statistical models for estimating occurrence and extinction from animal surveys

    USGS Publications Warehouse

    Dorazio, R.M.

    2007-01-01

    In surveys of natural animal populations the number of animals that are present and available to be detected at a sample location is often low, resulting in few or no detections. Low detection frequencies are especially common in surveys of imperiled species; however, the choice of sampling method and protocol also may influence the size of the population that is vulnerable to detection. In these circumstances, probabilities of animal occurrence and extinction will generally be estimated more accurately if the models used in data analysis account for differences in abundance among sample locations and for the dependence between site-specific abundance and detection. Simulation experiments are used to illustrate conditions wherein these types of models can be expected to outperform alternative estimators of population site occupancy and extinction. ?? 2007 by the Ecological Society of America.

  17. Species richness and occupancy estimation in communities subject to temporary emigration

    USGS Publications Warehouse

    Kery, M.; Royle, J. Andrew; Plattner, M.; Dorazio, R.M.

    2009-01-01

    Species richness is the most common biodiversity metric, although typically some species remain unobserved. Therefore, estimates of species richness and related quantities should account for imperfect detectability. Community dynamics can often be represented as superposition of species-specific phenologies (e. g., in taxa with well-defined flight [insects], activity [rodents], or vegetation periods [plants]). We develop a model for such predictably open communities wherein species richness is expressed as the sum over observed and unobserved species of estimated species-specific and site-specific occurrence indicators and where seasonal occurrence is modeled as a species-specific function of time. Our model is a multispecies extension of a multistate model with one unobservable state and represents a parsimonious way of dealing with a widespread form of 'temporary emigration.'' For illustration we use Swiss butterfly monitoring data collected under a robust design (RD); species were recorded on 13 transects during two secondary periods within <= 7 primary sampling periods. We compare estimates with those under a variation of the model applied to standard data, where secondary samples are pooled. The latter model yielded unrealistically high estimates of total community size of 274 species. In contrast, estimates were similar under models applied to RD data with constant (122) or seasonally varying (126) detectability for each species, but the former was more parsimonious and therefore used for inference. Per transect, 6 44 (mean 21.1) species were detected. Species richness estimates averaged 29.3; therefore only 71% (range 32-92%) of all species present were ever detected. In any primary period, 0.4-5.6 species present were overlooked. Detectability varied by species and averaged 0.88 per primary sampling period. Our modeling framework is extremely flexible; extensions such as covariates for the occurrence or detectability of individual species are easy. It should be useful for communities with a predictable form of temporary emigration where rigorous estimation of community metrics has proved challenging so far.

  18. Estimating the Size of a Large Network and its Communities from a Random Sample

    PubMed Central

    Chen, Lin; Karbasi, Amin; Crawford, Forrest W.

    2017-01-01

    Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V, E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. PMID:28867924

  19. Estimating the Size of a Large Network and its Communities from a Random Sample.

    PubMed

    Chen, Lin; Karbasi, Amin; Crawford, Forrest W

    2016-01-01

    Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = ( V, E ) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G ( W ) be the induced subgraph in G of the vertices in W . In addition to G ( W ), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K , and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios.

  20. Minimum variance geographic sampling

    NASA Technical Reports Server (NTRS)

    Terrell, G. R. (Principal Investigator)

    1980-01-01

    Resource inventories require samples with geographical scatter, sometimes not as widely spaced as would be hoped. A simple model of correlation over distances is used to create a minimum variance unbiased estimate population means. The fitting procedure is illustrated from data used to estimate Missouri corn acreage.

  1. How many dinosaur species were there? Fossil bias and true richness estimated using a Poisson sampling model

    PubMed Central

    Starrfelt, Jostein; Liow, Lee Hsiang

    2016-01-01

    The fossil record is a rich source of information about biological diversity in the past. However, the fossil record is not only incomplete but has also inherent biases due to geological, physical, chemical and biological factors. Our knowledge of past life is also biased because of differences in academic and amateur interests and sampling efforts. As a result, not all individuals or species that lived in the past are equally likely to be discovered at any point in time or space. To reconstruct temporal dynamics of diversity using the fossil record, biased sampling must be explicitly taken into account. Here, we introduce an approach that uses the variation in the number of times each species is observed in the fossil record to estimate both sampling bias and true richness. We term our technique TRiPS (True Richness estimated using a Poisson Sampling model) and explore its robustness to violation of its assumptions via simulations. We then venture to estimate sampling bias and absolute species richness of dinosaurs in the geological stages of the Mesozoic. Using TRiPS, we estimate that 1936 (1543–2468) species of dinosaurs roamed the Earth during the Mesozoic. We also present improved estimates of species richness trajectories of the three major dinosaur clades: the sauropodomorphs, ornithischians and theropods, casting doubt on the Jurassic–Cretaceous extinction event and demonstrating that all dinosaur groups are subject to considerable sampling bias throughout the Mesozoic. PMID:26977060

  2. How many dinosaur species were there? Fossil bias and true richness estimated using a Poisson sampling model.

    PubMed

    Starrfelt, Jostein; Liow, Lee Hsiang

    2016-04-05

    The fossil record is a rich source of information about biological diversity in the past. However, the fossil record is not only incomplete but has also inherent biases due to geological, physical, chemical and biological factors. Our knowledge of past life is also biased because of differences in academic and amateur interests and sampling efforts. As a result, not all individuals or species that lived in the past are equally likely to be discovered at any point in time or space. To reconstruct temporal dynamics of diversity using the fossil record, biased sampling must be explicitly taken into account. Here, we introduce an approach that uses the variation in the number of times each species is observed in the fossil record to estimate both sampling bias and true richness. We term our technique TRiPS (True Richness estimated using a Poisson Sampling model) and explore its robustness to violation of its assumptions via simulations. We then venture to estimate sampling bias and absolute species richness of dinosaurs in the geological stages of the Mesozoic. Using TRiPS, we estimate that 1936 (1543-2468) species of dinosaurs roamed the Earth during the Mesozoic. We also present improved estimates of species richness trajectories of the three major dinosaur clades: the sauropodomorphs, ornithischians and theropods, casting doubt on the Jurassic-Cretaceous extinction event and demonstrating that all dinosaur groups are subject to considerable sampling bias throughout the Mesozoic. © 2016 The Authors.

  3. Improving removal-based estimates of abundance by sampling a population of spatially distinct subpopulations

    USGS Publications Warehouse

    Dorazio, R.M.; Jelks, H.L.; Jordan, F.

    2005-01-01

     A statistical modeling framework is described for estimating the abundances of spatially distinct subpopulations of animals surveyed using removal sampling. To illustrate this framework, hierarchical models are developed using the Poisson and negative-binomial distributions to model variation in abundance among subpopulations and using the beta distribution to model variation in capture probabilities. These models are fitted to the removal counts observed in a survey of a federally endangered fish species. The resulting estimates of abundance have similar or better precision than those computed using the conventional approach of analyzing the removal counts of each subpopulation separately. Extension of the hierarchical models to include spatial covariates of abundance is straightforward and may be used to identify important features of an animal's habitat or to predict the abundance of animals at unsampled locations.

  4. A Model Based Approach to Sample Size Estimation in Recent Onset Type 1 Diabetes

    PubMed Central

    Bundy, Brian; Krischer, Jeffrey P.

    2016-01-01

    The area under the curve C-peptide following a 2-hour mixed meal tolerance test from 481 individuals enrolled on 5 prior TrialNet studies of recent onset type 1 diabetes from baseline to 12 months after enrollment were modelled to produce estimates of its rate of loss and variance. Age at diagnosis and baseline C-peptide were found to be significant predictors and adjusting for these in an ANCOVA resulted in estimates with lower variance. Using these results as planning parameters for new studies results in a nearly 50% reduction in the target sample size. The modelling also produces an expected C-peptide that can be used in Observed vs. Expected calculations to estimate the presumption of benefit in ongoing trials. PMID:26991448

  5. Estimating taxonomic diversity, extinction rates, and speciation rates from fossil data using capture-recapture models

    USGS Publications Warehouse

    Nichols, J.D.; Pollock, K.H.

    1983-01-01

    Capture-recapture models can be used to estimate parameters of interest from paleobiological data when encouter probabilities are unknown and variable over time. These models also permit estimation of sampling variances and goodness-of-fit tests are available for assessing the fit of data to most models. The authors describe capture-recapture models which should be useful in paleobiological analyses and discuss the assumptions which underlie them. They illustrate these models with examples and discuss aspects of study design.

  6. Counting Cats: Spatially Explicit Population Estimates of Cheetah (Acinonyx jubatus) Using Unstructured Sampling Data

    PubMed Central

    Broekhuis, Femke; Gopalaswamy, Arjun M.

    2016-01-01

    Many ecological theories and species conservation programmes rely on accurate estimates of population density. Accurate density estimation, especially for species facing rapid declines, requires the application of rigorous field and analytical methods. However, obtaining accurate density estimates of carnivores can be challenging as carnivores naturally exist at relatively low densities and are often elusive and wide-ranging. In this study, we employ an unstructured spatial sampling field design along with a Bayesian sex-specific spatially explicit capture-recapture (SECR) analysis, to provide the first rigorous population density estimates of cheetahs (Acinonyx jubatus) in the Maasai Mara, Kenya. We estimate adult cheetah density to be between 1.28 ± 0.315 and 1.34 ± 0.337 individuals/100km2 across four candidate models specified in our analysis. Our spatially explicit approach revealed ‘hotspots’ of cheetah density, highlighting that cheetah are distributed heterogeneously across the landscape. The SECR models incorporated a movement range parameter which indicated that male cheetah moved four times as much as females, possibly because female movement was restricted by their reproductive status and/or the spatial distribution of prey. We show that SECR can be used for spatially unstructured data to successfully characterise the spatial distribution of a low density species and also estimate population density when sample size is small. Our sampling and modelling framework will help determine spatial and temporal variation in cheetah densities, providing a foundation for their conservation and management. Based on our results we encourage other researchers to adopt a similar approach in estimating densities of individually recognisable species. PMID:27135614

  7. Counting Cats: Spatially Explicit Population Estimates of Cheetah (Acinonyx jubatus) Using Unstructured Sampling Data.

    PubMed

    Broekhuis, Femke; Gopalaswamy, Arjun M

    2016-01-01

    Many ecological theories and species conservation programmes rely on accurate estimates of population density. Accurate density estimation, especially for species facing rapid declines, requires the application of rigorous field and analytical methods. However, obtaining accurate density estimates of carnivores can be challenging as carnivores naturally exist at relatively low densities and are often elusive and wide-ranging. In this study, we employ an unstructured spatial sampling field design along with a Bayesian sex-specific spatially explicit capture-recapture (SECR) analysis, to provide the first rigorous population density estimates of cheetahs (Acinonyx jubatus) in the Maasai Mara, Kenya. We estimate adult cheetah density to be between 1.28 ± 0.315 and 1.34 ± 0.337 individuals/100km2 across four candidate models specified in our analysis. Our spatially explicit approach revealed 'hotspots' of cheetah density, highlighting that cheetah are distributed heterogeneously across the landscape. The SECR models incorporated a movement range parameter which indicated that male cheetah moved four times as much as females, possibly because female movement was restricted by their reproductive status and/or the spatial distribution of prey. We show that SECR can be used for spatially unstructured data to successfully characterise the spatial distribution of a low density species and also estimate population density when sample size is small. Our sampling and modelling framework will help determine spatial and temporal variation in cheetah densities, providing a foundation for their conservation and management. Based on our results we encourage other researchers to adopt a similar approach in estimating densities of individually recognisable species.

  8. New non-randomised model to assess the prevalence of discriminating behaviour: a pilot study on mephedrone

    PubMed Central

    2011-01-01

    Background An advantage of randomised response and non-randomised models investigating sensitive issues arises from the characteristic that individual answers about discriminating behaviour cannot be linked to the individuals. This study proposed a new fuzzy response model coined 'Single Sample Count' (SSC) to estimate prevalence of discriminating or embarrassing behaviour in epidemiologic studies. Methods The SSC was tested and compared to the established Forced Response (FR) model estimating Mephedrone use. Estimations from both SSC and FR were then corroborated with qualitative hair screening data. Volunteers (n = 318, mean age = 22.69 ± 5.87, 59.1% male) in a rural area in north Wales and a metropolitan area in England completed a questionnaire containing the SSC and FR in alternating order, and four questions canvassing opinions and beliefs regarding Mephedrone. Hair samples were screened for Mephedrone using a qualitative Liquid Chromatography-Mass Spectrometry method. Results The SSC algorithm improves upon the existing item count techniques by utilizing known population distributions and embeds the sensitive question among four unrelated innocuous questions with binomial distribution. Respondents are only asked to indicate how many without revealing which ones are true. The two probability models yielded similar estimates with the FR being between 2.6% - 15.0%; whereas the new SSC ranged between 0% - 10%. The six positive hair samples indicated that the prevalence rate in the sample was at least 4%. The close proximity of these estimates provides evidence to support the validity of the new SSC model. Using simulations, the recommended sample sizes as the function of the statistical power and expected prevalence rate were calculated. Conclusion The main advantages of the SSC over other indirect methods are: simple administration, completion and calculation, maximum use of the data and good face validity for all respondents. Owing to the key feature that respondents are not required to answer the sensitive question directly, coupled with the absence of forced response or obvious self-protective response strategy, the SSC has the potential to cut across self-protective barriers more effectively than other estimation models. This elegantly simple, quick and effective method can be successfully employed in public health research investigating compromising behaviours. PMID:21812979

  9. Limited sampling strategy models for estimating the AUC of gliclazide in Chinese healthy volunteers.

    PubMed

    Huang, Ji-Han; Wang, Kun; Huang, Xiao-Hui; He, Ying-Chun; Li, Lu-Jin; Sheng, Yu-Cheng; Yang, Juan; Zheng, Qing-Shan

    2013-06-01

    The aim of this work is to reduce the cost of required sampling for the estimation of the area under the gliclazide plasma concentration versus time curve within 60 h (AUC0-60t ). The limited sampling strategy (LSS) models were established and validated by the multiple regression model within 4 or fewer gliclazide concentration values. Absolute prediction error (APE), root of mean square error (RMSE) and visual prediction check were used as criterion. The results of Jack-Knife validation showed that 10 (25.0 %) of the 40 LSS based on the regression analysis were not within an APE of 15 % using one concentration-time point. 90.2, 91.5 and 92.4 % of the 40 LSS models were capable of prediction using 2, 3 and 4 points, respectively. Limited sampling strategies were developed and validated for estimating AUC0-60t of gliclazide. This study indicates that the implementation of an 80 mg dosage regimen enabled accurate predictions of AUC0-60t by the LSS model. This study shows that 12, 6, 4, 2 h after administration are the key sampling times. The combination of (12, 2 h), (12, 8, 2 h) or (12, 8, 4, 2 h) can be chosen as sampling hours for predicting AUC0-60t in practical application according to requirement.

  10. Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

    PubMed Central

    Chatterjee, Nilanjan; Chen, Yi-Hau; Maas, Paige; Carroll, Raymond J.

    2016-01-01

    Information from various public and private data sources of extremely large sample sizes are now increasingly available for research purposes. Statistical methods are needed for utilizing information from such big data sources while analyzing data from individual studies that may collect more detailed information required for addressing specific hypotheses of interest. In this article, we consider the problem of building regression models based on individual-level data from an “internal” study while utilizing summary-level information, such as information on parameters for reduced models, from an “external” big data source. We identify a set of very general constraints that link internal and external models. These constraints are used to develop a framework for semiparametric maximum likelihood inference that allows the distribution of covariates to be estimated using either the internal sample or an external reference sample. We develop extensions for handling complex stratified sampling designs, such as case-control sampling, for the internal study. Asymptotic theory and variance estimators are developed for each case. We use simulation studies and a real data application to assess the performance of the proposed methods in contrast to the generalized regression (GR) calibration methodology that is popular in the sample survey literature. PMID:27570323

  11. Optimal time points sampling in pathway modelling.

    PubMed

    Hu, Shiyan

    2004-01-01

    Modelling cellular dynamics based on experimental data is at the heart of system biology. Considerable progress has been made to dynamic pathway modelling as well as the related parameter estimation. However, few of them gives consideration for the issue of optimal sampling time selection for parameter estimation. Time course experiments in molecular biology rarely produce large and accurate data sets and the experiments involved are usually time consuming and expensive. Therefore, to approximate parameters for models with only few available sampling data is of significant practical value. For signal transduction, the sampling intervals are usually not evenly distributed and are based on heuristics. In the paper, we investigate an approach to guide the process of selecting time points in an optimal way to minimize the variance of parameter estimates. In the method, we first formulate the problem to a nonlinear constrained optimization problem by maximum likelihood estimation. We then modify and apply a quantum-inspired evolutionary algorithm, which combines the advantages of both quantum computing and evolutionary computing, to solve the optimization problem. The new algorithm does not suffer from the morass of selecting good initial values and being stuck into local optimum as usually accompanied with the conventional numerical optimization techniques. The simulation results indicate the soundness of the new method.

  12. Annual survival of Snail Kites in Florida: Radio telemetry versus capture-resighting data

    USGS Publications Warehouse

    Bennetts, R.E.; Dreitz, V.J.; Kitchens, W.M.; Hines, J.E.; Nichols, J.D.

    1999-01-01

    We estimated annual survival of Snail Kites (Rostrhamus sociabilis) in Florida using the Kaplan-Meier estimator with data from 271 radio-tagged birds over a three-year period and capture-recapture (resighting) models with data from 1,319 banded birds over a six-year period. We tested the hypothesis that survival differed among three age classes using both data sources. We tested additional hypotheses about spatial and temporal variation using a combination of data from radio telemetry and single- and multistrata capture-recapture models. Results from these data sets were similar in their indications of the sources of variation in survival, but they differed in some parameter estimates. Both data sources indicated that survival was higher for adults than for juveniles, but they did not support delineation of a subadult age class. Our data also indicated that survival differed among years and regions for juveniles but not for adults. Estimates of juvenile survival using radio telemetry data were higher than estimates using capture-recapture models for two of three years (1992 and 1993). Ancillary evidence based on censored birds indicated that some mortality of radio-tagged juveniles went undetected during those years, resulting in biased estimates. Thus, we have greater confidence in our estimates of juvenile survival using capture-recapture models. Precision of estimates reflected the number of parameters estimated and was surprisingly similar between radio telemetry and single-stratum capture-recapture models, given the substantial differences in sample sizes. Not having to estimate resighting probability likely offsets, to some degree, the smaller sample sizes from our radio telemetry data. Precision of capture-recapture models was lower using multistrata models where region-specific parameters were estimated than using single-stratum models, where spatial variation in parameters was not taken into account.

  13. Comparison of blood flow models and acquisitions for quantitative myocardial perfusion estimation from dynamic CT

    NASA Astrophysics Data System (ADS)

    Bindschadler, Michael; Modgil, Dimple; Branch, Kelley R.; La Riviere, Patrick J.; Alessio, Adam M.

    2014-04-01

    Myocardial blood flow (MBF) can be estimated from dynamic contrast enhanced (DCE) cardiac CT acquisitions, leading to quantitative assessment of regional perfusion. The need for low radiation dose and the lack of consensus on MBF estimation methods motivates this study to refine the selection of acquisition protocols and models for CT-derived MBF. DCE cardiac CT acquisitions were simulated for a range of flow states (MBF = 0.5, 1, 2, 3 ml (min g)-1, cardiac output = 3, 5, 8 L min-1). Patient kinetics were generated by a mathematical model of iodine exchange incorporating numerous physiological features including heterogenenous microvascular flow, permeability and capillary contrast gradients. CT acquisitions were simulated for multiple realizations of realistic x-ray flux levels. CT acquisitions that reduce radiation exposure were implemented by varying both temporal sampling (1, 2, and 3 s sampling intervals) and tube currents (140, 70, and 25 mAs). For all acquisitions, we compared three quantitative MBF estimation methods (two-compartment model, an axially-distributed model, and the adiabatic approximation to the tissue homogeneous model) and a qualitative slope-based method. In total, over 11 000 time attenuation curves were used to evaluate MBF estimation in multiple patient and imaging scenarios. After iodine-based beam hardening correction, the slope method consistently underestimated flow by on average 47.5% and the quantitative models provided estimates with less than 6.5% average bias and increasing variance with increasing dose reductions. The three quantitative models performed equally well, offering estimates with essentially identical root mean squared error (RMSE) for matched acquisitions. MBF estimates using the qualitative slope method were inferior in terms of bias and RMSE compared to the quantitative methods. MBF estimate error was equal at matched dose reductions for all quantitative methods and range of techniques evaluated. This suggests that there is no particular advantage between quantitative estimation methods nor to performing dose reduction via tube current reduction compared to temporal sampling reduction. These data are important for optimizing implementation of cardiac dynamic CT in clinical practice and in prospective CT MBF trials.

  14. Analysis of Terrestrial Conditions and Dynamics

    NASA Technical Reports Server (NTRS)

    Goward, S. N.

    1985-01-01

    An ecological model is developed to estimate annual net primary productivity of vegetation in twelve major North American biomes. Three models are adapted and combined, each addressing a different factor known to govern primary productivity, i.e., photosynthesis, respiration, and moisture availability. Measures of intercepted photosynthetically active radiation (1PAR) for input to the photosynthesis model are derived from spectral vegetation index data. Normalized Difference Vegetation Index (NDVI) data are produced from NOAA-7 Advanced Very High Resolution Radiometer (AVHRR) observations for April 1982 through March 1983. NDVI values are sampled from within the biomes at locations for which climatological data are available. Monthly estimates of Net Primary Productivity (NPP) for each sample location are generated and summed over the twelve month period. These monthly estimates are averaged to produce a single annual estimated NPP value for each biomes. Comparison of estimated NPP values with figures reported in the literature produces a correlation coefficient of 85.

  15. Bayesian hierarchical models for smoothing in two-phase studies, with application to small area estimation.

    PubMed

    Ross, Michelle; Wakefield, Jon

    2015-10-01

    Two-phase study designs are appealing since they allow for the oversampling of rare sub-populations which improves efficiency. In this paper we describe a Bayesian hierarchical model for the analysis of two-phase data. Such a model is particularly appealing in a spatial setting in which random effects are introduced to model between-area variability. In such a situation, one may be interested in estimating regression coefficients or, in the context of small area estimation, in reconstructing the population totals by strata. The efficiency gains of the two-phase sampling scheme are compared to standard approaches using 2011 birth data from the research triangle area of North Carolina. We show that the proposed method can overcome small sample difficulties and improve on existing techniques. We conclude that the two-phase design is an attractive approach for small area estimation.

  16. What’s Driving Uncertainty? The Model or the Model Parameters (What’s Driving Uncertainty? The influences of model and model parameters in data analysis)

    DOE PAGES

    Anderson-Cook, Christine Michaela

    2017-03-01

    Here, one of the substantial improvements to the practice of data analysis in recent decades is the change from reporting just a point estimate for a parameter or characteristic, to now including a summary of uncertainty for that estimate. Understanding the precision of the estimate for the quantity of interest provides better understanding of what to expect and how well we are able to predict future behavior from the process. For example, when we report a sample average as an estimate of the population mean, it is good practice to also provide a confidence interval (or credible interval, if youmore » are doing a Bayesian analysis) to accompany that summary. This helps to calibrate what ranges of values are reasonable given the variability observed in the sample and the amount of data that were included in producing the summary.« less

  17. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  18. The Beta-Geometric Model Applied to Fecundability in a Sample of Married Women

    NASA Astrophysics Data System (ADS)

    Adekanmbi, D. B.; Bamiduro, T. A.

    2006-10-01

    The time required to achieve pregnancy among married couples termed fecundability has been proposed to follow a beta-geometric distribution. The accuracy of the method used in estimating the parameters of the model has an implication on the goodness of fit of the model. In this study, the parameters of the model are estimated using the Method of Moments and Newton-Raphson estimation procedure. The goodness of fit of the model was considered, using estimates from the two methods of estimation, as well as the asymptotic relative efficiency of the estimates. A noticeable improvement in the fit of the model to the data on time to conception was observed, when the parameters are estimated by Newton-Raphson procedure, and thereby estimating reasonable expectations of fecundability for married female population in the country.

  19. Effects of lidar pulse density and sample size on a model-assisted approach to estimate forest inventory variables

    Treesearch

    Jacob Strunk; Hailemariam Temesgen; Hans-Erik Andersen; James P. Flewelling; Lisa Madsen

    2012-01-01

    Using lidar in an area-based model-assisted approach to forest inventory has the potential to increase estimation precision for some forest inventory variables. This study documents the bias and precision of a model-assisted (regression estimation) approach to forest inventory with lidar-derived auxiliary variables relative to lidar pulse density and the number of...

  20. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.

    PubMed

    Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu

    2018-05-30

    One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.

  1. Evaluation of the Webler-Brown model for estimating tetrachloroethylene exposure from vinyl-lined asbestos-cement pipes

    PubMed Central

    Spence, Lisa A; Aschengrau, Ann; Gallagher, Lisa E; Webster, Thomas F; Heeren, Timothy C; Ozonoff, David M

    2008-01-01

    Background From May 1968 through March 1980, vinyl-lined asbestos-cement (VL/AC) water distribution pipes were installed in New England to avoid taste and odor problems associated with asbestos-cement pipes. The vinyl resin was applied to the inner pipe surface in a solution of tetrachloroethylene (perchloroethylene, PCE). Substantial amounts of PCE remained in the liner and subsequently leached into public drinking water supplies. Methods Once aware of the leaching problem and prior to remediation (April-November 1980), Massachusetts regulators collected drinking water samples from VL/AC pipes to determine the extent and severity of the PCE contamination. This study compares newly obtained historical records of PCE concentrations in water samples (n = 88) with concentrations estimated using an exposure model employed in epidemiologic studies on the cancer risk associated with PCE-contaminated drinking water. The exposure model was developed by Webler and Brown to estimate the mass of PCE delivered to subjects' residences. Results The mean and median measured PCE concentrations in the water samples were 66 and 0.5 μg/L, respectively, and the range extended from non-detectable to 2432 μg/L. The model-generated concentration estimates and water sample concentrations were moderately correlated (Spearman rank correlation coefficient = 0.48, p < 0.0001). Correlations were higher in samples taken at taps and spigots vs. hydrants (ρ = 0.84 vs. 0.34), in areas with simple vs. complex geometry (ρ = 0.51 vs. 0.38), and near pipes installed in 1973–1976 vs. other years (ρ = 0.56 vs. 0.42 for 1968–1972 and 0.37 for 1977–1980). Overall, 24% of the variance in measured PCE concentrations was explained by the model-generated concentration estimates (p < 0.0001). Almost half of the water samples had undetectable concentrations of PCE. Undetectable levels were more common in areas with the earliest installed VL/AC pipes, at the beginning and middle of VL/AC pipes, at hydrants, and in complex pipe configurations. Conclusion PCE concentration estimates generated using the Webler-Brown model were moderately correlated with measured water concentrations. The present analysis suggests that the exposure assessment process used in prior epidemiological studies could be improved with more accurate characterization of water flow. This study illustrates one method of validating an exposure model in an epidemiological study when historical measurements are not available. PMID:18518975

  2. Evaluation of the Webler-Brown model for estimating tetrachloroethylene exposure from vinyl-lined asbestos-cement pipes.

    PubMed

    Spence, Lisa A; Aschengrau, Ann; Gallagher, Lisa E; Webster, Thomas F; Heeren, Timothy C; Ozonoff, David M

    2008-06-02

    From May 1968 through March 1980, vinyl-lined asbestos-cement (VL/AC) water distribution pipes were installed in New England to avoid taste and odor problems associated with asbestos-cement pipes. The vinyl resin was applied to the inner pipe surface in a solution of tetrachloroethylene (perchloroethylene, PCE). Substantial amounts of PCE remained in the liner and subsequently leached into public drinking water supplies. Once aware of the leaching problem and prior to remediation (April-November 1980), Massachusetts regulators collected drinking water samples from VL/AC pipes to determine the extent and severity of the PCE contamination. This study compares newly obtained historical records of PCE concentrations in water samples (n = 88) with concentrations estimated using an exposure model employed in epidemiologic studies on the cancer risk associated with PCE-contaminated drinking water. The exposure model was developed by Webler and Brown to estimate the mass of PCE delivered to subjects' residences. The mean and median measured PCE concentrations in the water samples were 66 and 0.5 microg/L, respectively, and the range extended from non-detectable to 2432 microg/L. The model-generated concentration estimates and water sample concentrations were moderately correlated (Spearman rank correlation coefficient = 0.48, p < 0.0001). Correlations were higher in samples taken at taps and spigots vs. hydrants (rho = 0.84 vs. 0.34), in areas with simple vs. complex geometry (rho = 0.51 vs. 0.38), and near pipes installed in 1973-1976 vs. other years (rho = 0.56 vs. 0.42 for 1968-1972 and 0.37 for 1977-1980). Overall, 24% of the variance in measured PCE concentrations was explained by the model-generated concentration estimates (p < 0.0001). Almost half of the water samples had undetectable concentrations of PCE. Undetectable levels were more common in areas with the earliest installed VL/AC pipes, at the beginning and middle of VL/AC pipes, at hydrants, and in complex pipe configurations. PCE concentration estimates generated using the Webler-Brown model were moderately correlated with measured water concentrations. The present analysis suggests that the exposure assessment process used in prior epidemiological studies could be improved with more accurate characterization of water flow. This study illustrates one method of validating an exposure model in an epidemiological study when historical measurements are not available.

  3. Some Empirical Evidence for Latent Trait Model Selection.

    ERIC Educational Resources Information Center

    Hutten, Leah R.

    The results of this study suggest that for purposes of estimating ability by latent trait methods, the Rasch model compares favorably with the three-parameter logistic model. Using estimated parameters to make predictions about 25 actual number-correct score distributions with samples of 1,000 cases each, those predicted by the Rasch model fit the…

  4. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.

  5. Model-assisted estimation of forest resources with generalized additive models

    Treesearch

    Jean D. Opsomer; F. Jay Breidt; Gretchen G. Moisen; Goran Kauermann

    2007-01-01

    Multiphase surveys are often conducted in forest inventories, with the goal of estimating forested area and tree characteristics over large regions. This article describes how design-based estimation of such quantities, based on information gathered during ground visits of sampled plots, can be made more precise by incorporating auxiliary information available from...

  6. Normalization Regression Estimation With Application to a Nonorthogonal, Nonrecursive Model of School Learning.

    ERIC Educational Resources Information Center

    Bulcock, J. W.; And Others

    Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…

  7. What to Do about Zero Frequency Cells when Estimating Polychoric Correlations

    ERIC Educational Resources Information Center

    Savalei, Victoria

    2011-01-01

    Categorical structural equation modeling (SEM) methods that fit the model to estimated polychoric correlations have become popular in the social sciences. When population thresholds are high in absolute value, contingency tables in small samples are likely to contain zero frequency cells. Such cells make the estimation of the polychoric…

  8. A Bayesian hierarchical model for mortality data from cluster-sampling household surveys in humanitarian crises.

    PubMed

    Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko

    2018-05-31

    The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.

  9. On the use of secondary capture-recapture samples to estimate temporary emigration and breeding proportions

    USGS Publications Warehouse

    Kendall, W.L.; Nichols, J.D.; North, P.M.; Nichols, J.D.

    1995-01-01

    The use of the Cormack- Jolly-Seber model under a standard sampling scheme of one sample per time period, when the Jolly-Seber assumption that all emigration is permanent does not hold, leads to the confounding of temporary emigration probabilities with capture probabilities. This biases the estimates of capture probability when temporary emigration is a completely random process, and both capture and survival probabilities when there is a temporary trap response in temporary emigration, or it is Markovian. The use of secondary capture samples over a shorter interval within each period, during which the population is assumed to be closed (Pollock's robust design), provides a second source of information on capture probabilities. This solves the confounding problem, and thus temporary emigration probabilities can be estimated. This process can be accomplished in an ad hoc fashion for completely random temporary emigration and to some extent in the temporary trap response case, but modelling the complete sampling process provides more flexibility and permits direct estimation of variances. For the case of Markovian temporary emigration, a full likelihood is required.

  10. Statistical inference for the additive hazards model under outcome-dependent sampling.

    PubMed

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  11. Statistical inference for the additive hazards model under outcome-dependent sampling

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo

    2015-01-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363

  12. Measuring selected PPCPs in wastewater to estimate the population in different cities in China.

    PubMed

    Gao, Jianfa; O'Brien, Jake; Du, Peng; Li, Xiqing; Ort, Christoph; Mueller, Jochen F; Thai, Phong K

    2016-10-15

    Sampling and analysis of wastewater from municipal wastewater treatment plants (WWTPs) has become a useful tool for understanding exposure to chemicals. Both wastewater based studies and management and planning of the catchment require information on catchment population in the time of monitoring. Recently, a model has been developed and calibrated using selected pharmaceutical and personal care products (PPCPs) measured in influent wastewater for estimating population in different catchments in Australia. The present study aimed at evaluating the feasibility of utilizing this population estimation approach in China. Twenty-four hour composite influent samples were collected from 31 WWTPs in 17 cities with catchment sizes from 200,000-3,450,000 people representing all seven regions of China. The samples were analyzed for 19 PPCPs using liquid chromatography coupled to tandem mass spectrometry in direct injection mode. Eight chemicals were detected in more than 50% of the samples. Significant positive correlations were found between individual PPCP mass loads and population estimates provided by WWTP operators. Using the PPCP mass load modeling approach calibrated with WWTP operator data, we estimated the population size of each catchment with good agreement with WWTP operator values (between 50-200% for all sites and 75-125% for 23 of the 31 sites). Overall, despite much lower detection and relatively high heterogeneity in PPCP consumption across China the model provided a good estimate of the population contributing to a given wastewater sample. Wastewater analysis could also provide objective PPCP consumption status in China. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Statistical methods for efficient design of community surveys of response to noise: Random coefficients regression models

    NASA Technical Reports Server (NTRS)

    Tomberlin, T. J.

    1985-01-01

    Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.

  14. Trap configuration and spacing influences parameter estimates in spatial capture-recapture models

    USGS Publications Warehouse

    Sun, Catherine C.; Fuller, Angela K.; Royle, J. Andrew

    2014-01-01

    An increasing number of studies employ spatial capture-recapture models to estimate population size, but there has been limited research on how different spatial sampling designs and trap configurations influence parameter estimators. Spatial capture-recapture models provide an advantage over non-spatial models by explicitly accounting for heterogeneous detection probabilities among individuals that arise due to the spatial organization of individuals relative to sampling devices. We simulated black bear (Ursus americanus) populations and spatial capture-recapture data to evaluate the influence of trap configuration and trap spacing on estimates of population size and a spatial scale parameter, sigma, that relates to home range size. We varied detection probability and home range size, and considered three trap configurations common to large-mammal mark-recapture studies: regular spacing, clustered, and a temporal sequence of different cluster configurations (i.e., trap relocation). We explored trap spacing and number of traps per cluster by varying the number of traps. The clustered arrangement performed well when detection rates were low, and provides for easier field implementation than the sequential trap arrangement. However, performance differences between trap configurations diminished as home range size increased. Our simulations suggest it is important to consider trap spacing relative to home range sizes, with traps ideally spaced no more than twice the spatial scale parameter. While spatial capture-recapture models can accommodate different sampling designs and still estimate parameters with accuracy and precision, our simulations demonstrate that aspects of sampling design, namely trap configuration and spacing, must consider study area size, ranges of individual movement, and home range sizes in the study population.

  15. Occupancy Estimation and Modeling : Inferring Patterns and Dynamics of Species Occurrence

    USGS Publications Warehouse

    MacKenzie, D.I.; Nichols, J.D.; Royle, J. Andrew; Pollock, K.H.; Bailey, L.L.; Hines, J.E.

    2006-01-01

    This is the first book to examine the latest methods in analyzing presence/absence data surveys. Using four classes of models (single-species, single-season; single-species, multiple season; multiple-species, single-season; and multiple-species, multiple-season), the authors discuss the practical sampling situation, present a likelihood-based model enabling direct estimation of the occupancy-related parameters while allowing for imperfect detectability, and make recommendations for designing studies using these models. It provides authoritative insights into the latest in estimation modeling; discusses multiple models which lay the groundwork for future study designs; addresses critical issues of imperfect detectibility and its effects on estimation; and explores the role of probability in estimating in detail.

  16. Alternative Models for Small Samples in Psychological Research: Applying Linear Mixed Effects Models and Generalized Estimating Equations to Repeated Measures Data

    ERIC Educational Resources Information Center

    Muth, Chelsea; Bales, Karen L.; Hinde, Katie; Maninger, Nicole; Mendoza, Sally P.; Ferrer, Emilio

    2016-01-01

    Unavoidable sample size issues beset psychological research that involves scarce populations or costly laboratory procedures. When incorporating longitudinal designs these samples are further reduced by traditional modeling techniques, which perform listwise deletion for any instance of missing data. Moreover, these techniques are limited in their…

  17. An Indirect System Identification Technique for Stable Estimation of Continuous-Time Parameters of the Vestibulo-Ocular Reflex (VOR)

    NASA Technical Reports Server (NTRS)

    Kukreja, Sunil L.; Wallin, Ragnar; Boyle, Richard D.

    2013-01-01

    The vestibulo-ocular reflex (VOR) is a well-known dual mode bifurcating system that consists of slow and fast modes associated with nystagmus and saccade, respectively. Estimation of continuous-time parameters of nystagmus and saccade models are known to be sensitive to estimation methodology, noise and sampling rate. The stable and accurate estimation of these parameters are critical for accurate disease modelling, clinical diagnosis, robotic control strategies, mission planning for space exploration and pilot safety, etc. This paper presents a novel indirect system identification method for the estimation of continuous-time parameters of VOR employing standardised least-squares with dual sampling rates in a sparse structure. This approach permits the stable and simultaneous estimation of both nystagmus and saccade data. The efficacy of this approach is demonstrated via simulation of a continuous-time model of VOR with typical parameters found in clinical studies and in the presence of output additive noise.

  18. Effects of Calibration Sample Size and Item Bank Size on Ability Estimation in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Sahin, Alper; Weiss, David J.

    2015-01-01

    This study aimed to investigate the effects of calibration sample size and item bank size on examinee ability estimation in computerized adaptive testing (CAT). For this purpose, a 500-item bank pre-calibrated using the three-parameter logistic model with 10,000 examinees was simulated. Calibration samples of varying sizes (150, 250, 350, 500,…

  19. Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning

    DTIC Science & Technology

    2008-01-01

    active learning framework for SVM-based and boosting-based rank learning. Our approach suggests sampling based on maximizing the estimated loss differential over unlabeled data. Experimental results on two benchmark corpora show that the proposed model substantially reduces the labeling effort, and achieves superior performance rapidly with as much as 30% relative improvement over the margin-based sampling

  20. Multiobjective sampling design for parameter estimation and model discrimination in groundwater solute transport

    USGS Publications Warehouse

    Knopman, Debra S.; Voss, Clifford I.

    1989-01-01

    Sampling design for site characterization studies of solute transport in porous media is formulated as a multiobjective problem. Optimal design of a sampling network is a sequential process in which the next phase of sampling is designed on the basis of all available physical knowledge of the system. Three objectives are considered: model discrimination, parameter estimation, and cost minimization. For the first two objectives, physically based measures of the value of information obtained from a set of observations are specified. In model discrimination, value of information of an observation point is measured in terms of the difference in solute concentration predicted by hypothesized models of transport. Points of greatest difference in predictions can contribute the most information to the discriminatory power of a sampling design. Sensitivity of solute concentration to a change in a parameter contributes information on the relative variance of a parameter estimate. Inclusion of points in a sampling design with high sensitivities to parameters tends to reduce variance in parameter estimates. Cost minimization accounts for both the capital cost of well installation and the operating costs of collection and analysis of field samples. Sensitivities, discrimination information, and well installation and sampling costs are used to form coefficients in the multiobjective problem in which the decision variables are binary (zero/one), each corresponding to the selection of an observation point in time and space. The solution to the multiobjective problem is a noninferior set of designs. To gain insight into effective design strategies, a one-dimensional solute transport problem is hypothesized. Then, an approximation of the noninferior set is found by enumerating 120 designs and evaluating objective functions for each of the designs. Trade-offs between pairs of objectives are demonstrated among the models. The value of an objective function for a given design is shown to correspond to the ability of a design to actually meet an objective.

  1. Validation of a Sampling Method to Collect Exposure Data for Central-Line-Associated Bloodstream Infections.

    PubMed

    Hammami, Naïma; Mertens, Karl; Overholser, Rosanna; Goetghebeur, Els; Catry, Boudewijn; Lambert, Marie-Laurence

    2016-05-01

    Surveillance of central-line-associated bloodstream infections requires the labor-intensive counting of central-line days (CLDs). This workload could be reduced by sampling. Our objective was to evaluate the accuracy of various sampling strategies in the estimation of CLDs in intensive care units (ICUs) and to establish a set of rules to identify optimal sampling strategies depending on ICU characteristics. Analyses of existing data collected according to the European protocol for patient-based surveillance of ICU-acquired infections in Belgium between 2004 and 2012. CLD data were reported by 56 ICUs in 39 hospitals during 364 trimesters. We compared estimated CLD data obtained from weekly and monthly sampling schemes with the observed exhaustive CLD data over the trimester by assessing the CLD percentage error (ie, observed CLDs - estimated CLDs/observed CLDs). We identified predictors of improved accuracy using linear mixed models. When sampling once per week or 3 times per month, 80% of ICU trimesters had a CLD percentage error within 10%. When sampling twice per week, this was >90% of ICU trimesters. Sampling on Tuesdays provided the best estimations. In the linear mixed model, the observed CLD count was the best predictor for a smaller percentage error. The following sampling strategies provided an estimate within 10% of the actual CLD for 97% of the ICU trimesters with 90% confidence: 3 times per month in an ICU with >650 CLDs per trimester or each Tuesday in an ICU with >480 CLDs per trimester. Sampling of CLDs provides an acceptable alternative to daily collection of CLD data.

  2. Modeling the uncertainty of estimating forest carbon stocks in China

    NASA Astrophysics Data System (ADS)

    Yue, T. X.; Wang, Y. F.; Du, Z. P.; Zhao, M. W.; Zhang, L. L.; Zhao, N.; Lu, M.; Larocque, G. R.; Wilson, J. P.

    2015-12-01

    Earth surface systems are controlled by a combination of global and local factors, which cannot be understood without accounting for both the local and global components. The system dynamics cannot be recovered from the global or local controls alone. Ground forest inventory is able to accurately estimate forest carbon stocks at sample plots, but these sample plots are too sparse to support the spatial simulation of carbon stocks with required accuracy. Satellite observation is an important source of global information for the simulation of carbon stocks. Satellite remote-sensing can supply spatially continuous information about the surface of forest carbon stocks, which is impossible from ground-based investigations, but their description has considerable uncertainty. In this paper, we validated the Lund-Potsdam-Jena dynamic global vegetation model (LPJ), the Kriging method for spatial interpolation of ground sample plots and a satellite-observation-based approach as well as an approach for fusing the ground sample plots with satellite observations and an assimilation method for incorporating the ground sample plots into LPJ. The validation results indicated that both the data fusion and data assimilation approaches reduced the uncertainty of estimating carbon stocks. The data fusion had the lowest uncertainty by using an existing method for high accuracy surface modeling to fuse the ground sample plots with the satellite observations (HASM-SOA). The estimates produced with HASM-SOA were 26.1 and 28.4 % more accurate than the satellite-based approach and spatial interpolation of the sample plots, respectively. Forest carbon stocks of 7.08 Pg were estimated for China during the period from 2004 to 2008, an increase of 2.24 Pg from 1984 to 2008, using the preferred HASM-SOA method.

  3. Hierarchical spatial models of abundance and occurrence from imperfect survey data

    USGS Publications Warehouse

    Royle, J. Andrew; Kery, M.; Gautier, R.; Schmid, Hans

    2007-01-01

    Many estimation and inference problems arising from large-scale animal surveys are focused on developing an understanding of patterns in abundance or occurrence of a species based on spatially referenced count data. One fundamental challenge, then, is that it is generally not feasible to completely enumerate ('census') all individuals present in each sample unit. This observation bias may consist of several components, including spatial coverage bias (not all individuals in the Population are exposed to sampling) and detection bias (exposed individuals may go undetected). Thus, observations are biased for the state variable (abundance, occupancy) that is the object of inference. Moreover, data are often sparse for most observation locations, requiring consideration of methods for spatially aggregating or otherwise combining sparse data among sample units. The development of methods that unify spatial statistical models with models accommodating non-detection is necessary to resolve important spatial inference problems based on animal survey data. In this paper, we develop a novel hierarchical spatial model for estimation of abundance and occurrence from survey data wherein detection is imperfect. Our application is focused on spatial inference problems in the Swiss Survey of Common Breeding Birds. The observation model for the survey data is specified conditional on the unknown quadrat population size, N(s). We augment the observation model with a spatial process model for N(s), describing the spatial variation in abundance of the species. The model includes explicit sources of variation in habitat structure (forest, elevation) and latent variation in the form of a correlated spatial process. This provides a model-based framework for combining the spatially referenced samples while at the same time yielding a unified treatment of estimation problems involving both abundance and occurrence. We provide a Bayesian framework for analysis and prediction based on the integrated likelihood, and we use the model to obtain estimates of abundance and occurrence maps for the European Jay (Garrulus glandarius), a widespread, elusive, forest bird. The naive national abundance estimate ignoring imperfect detection and incomplete quadrat coverage was 77 766 territories. Accounting for imperfect detection added approximately 18 000 territories, and adjusting for coverage bias added another 131 000 territories to yield a fully corrected estimate of the national total of about 227 000 territories. This is approximately three times as high as previous estimates that assume every territory is detected in each quadrat.

  4. Estimation of Dynamic Discrete Choice Models by Maximum Likelihood and the Simulated Method of Moments

    PubMed Central

    Eisenhauer, Philipp; Heckman, James J.; Mosso, Stefano

    2015-01-01

    We compare the performance of maximum likelihood (ML) and simulated method of moments (SMM) estimation for dynamic discrete choice models. We construct and estimate a simplified dynamic structural model of education that captures some basic features of educational choices in the United States in the 1980s and early 1990s. We use estimates from our model to simulate a synthetic dataset and assess the ability of ML and SMM to recover the model parameters on this sample. We investigate the performance of alternative tuning parameters for SMM. PMID:26494926

  5. Pairing field methods to improve inference in wildlife surveys while accommodating detection covariance

    USGS Publications Warehouse

    Clare, John; McKinney, Shawn T.; DePue, John E.; Loftin, Cynthia S.

    2017-01-01

    It is common to use multiple field sampling methods when implementing wildlife surveys to compare method efficacy or cost efficiency, integrate distinct pieces of information provided by separate methods, or evaluate method-specific biases and misclassification error. Existing models that combine information from multiple field methods or sampling devices permit rigorous comparison of method-specific detection parameters, enable estimation of additional parameters such as false-positive detection probability, and improve occurrence or abundance estimates, but with the assumption that the separate sampling methods produce detections independently of one another. This assumption is tenuous if methods are paired or deployed in close proximity simultaneously, a common practice that reduces the additional effort required to implement multiple methods and reduces the risk that differences between method-specific detection parameters are confounded by other environmental factors. We develop occupancy and spatial capture–recapture models that permit covariance between the detections produced by different methods, use simulation to compare estimator performance of the new models to models assuming independence, and provide an empirical application based on American marten (Martes americana) surveys using paired remote cameras, hair catches, and snow tracking. Simulation results indicate existing models that assume that methods independently detect organisms produce biased parameter estimates and substantially understate estimate uncertainty when this assumption is violated, while our reformulated models are robust to either methodological independence or covariance. Empirical results suggested that remote cameras and snow tracking had comparable probability of detecting present martens, but that snow tracking also produced false-positive marten detections that could potentially substantially bias distribution estimates if not corrected for. Remote cameras detected marten individuals more readily than passive hair catches. Inability to photographically distinguish individual sex did not appear to induce negative bias in camera density estimates; instead, hair catches appeared to produce detection competition between individuals that may have been a source of negative bias. Our model reformulations broaden the range of circumstances in which analyses incorporating multiple sources of information can be robustly used, and our empirical results demonstrate that using multiple field-methods can enhance inferences regarding ecological parameters of interest and improve understanding of how reliably survey methods sample these parameters.

  6. Integrating resource selection into spatial capture-recapture models for large carnivores

    USGS Publications Warehouse

    Proffitt, Kelly M.; Goldberg, Joshua; Hebblewite, Mark; Russell, Robin E.; Jimenez, Ben; Robinson, Hugh S.; Pilgrim, Kristine; Schwartz, Michael K.

    2015-01-01

    Wildlife managers need reliable methods to estimate large carnivore densities and population trends; yet large carnivores are elusive, difficult to detect, and occur at low densities making traditional approaches intractable. Recent advances in spatial capture-recapture (SCR) models have provided new approaches for monitoring trends in wildlife abundance and these methods are particularly applicable to large carnivores. We applied SCR models in a Bayesian framework to estimate mountain lion densities in the Bitterroot Mountains of west central Montana. We incorporate an existing resource selection function (RSF) as a density covariate to account for heterogeneity in habitat use across the study area and include data collected from harvested lions. We identify individuals through DNA samples collected by (1) biopsy darting mountain lions detected in systematic surveys of the study area, (2) opportunistically collecting hair and scat samples, and (3) sampling all harvested mountain lions. We included 80 DNA samples collected from 62 individuals in the analysis. Including information on predicted habitat use as a covariate on the distribution of activity centers reduced the median estimated density by 44%, the standard deviation by 7%, and the width of 95% credible intervals by 10% as compared to standard SCR models. Within the two management units of interest, we estimated a median mountain lion density of 4.5 mountain lions/100 km2 (95% CI = 2.9, 7.7) and 5.2 mountain lions/100 km2 (95% CI = 3.4, 9.1). Including harvested individuals (dead recovery) did not create a significant bias in the detection process by introducing individuals that could not be detected after removal. However, the dead recovery component of the model did have a substantial effect on results by increasing sample size. The ability to account for heterogeneity in habitat use provides a useful extension to SCR models, and will enhance the ability of wildlife managers to reliably and economically estimate density of wildlife populations, particularly large carnivores.

  7. Critical Loads of Acid Deposition for Wilderness Lakes in the Sierra Nevada (California) Estimated by the Steady-State Water Chemistry Model

    Treesearch

    Glenn D. Shaw; Ricardo Cisneros; Donald Schweizer; James O. Sickman; Mark E. Fenn

    2014-01-01

    Major ion chemistry (2000-2009) from 208 lakes (342 sample dates and 600 samples) in class I and II wilderness areas of the Sierra Nevada was used in the Steady-State Water Chemistry (SSWC) model to estimate critical loads for acid deposition and investigate the current vulnerability of high elevation lakes to acid deposition. The majority of the lakes were dilute (...

  8. Black bear density in Glacier National Park, Montana

    USGS Publications Warehouse

    Stetz, Jeff B.; Kendall, Katherine C.; Macleod, Amy C.

    2013-01-01

    We report the first abundance and density estimates for American black bears (Ursus americanus) in Glacier National Park (NP),Montana, USA.We used data from 2 independent and concurrent noninvasive genetic sampling methods—hair traps and bear rubs—collected during 2004 to generate individual black bear encounter histories for use in closed population mark–recapture models. We improved the precision of our abundance estimate by using noninvasive genetic detection events to develop individual-level covariates of sampling effort within the full and one-half mean maximum distance moved (MMDM) from each bear’s estimated activity center to explain capture probability heterogeneity and inform our estimate of the effective sampling area.Models including the one-halfMMDMcovariate received overwhelming Akaike’s Information Criterion support suggesting that buffering our study area by this distance would be more appropriate than no buffer or the full MMDM buffer for estimating the effectively sampled area and thereby density. Our modelaveraged super-population abundance estimate was 603 (95% CI¼522–684) black bears for Glacier NP. Our black bear density estimate (11.4 bears/100 km2, 95% CI¼9.9–13.0) was consistent with published estimates for populations that are sympatric with grizzly bears (U. arctos) and without access to spawning salmonids. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.

  9. A scenario tree model for the Canadian Notifiable Avian Influenza Surveillance System and its application to estimation of probability of freedom and sample size determination.

    PubMed

    Christensen, Jette; Stryhn, Henrik; Vallières, André; El Allaki, Farouk

    2011-05-01

    In 2008, Canada designed and implemented the Canadian Notifiable Avian Influenza Surveillance System (CanNAISS) with six surveillance activities in a phased-in approach. CanNAISS was a surveillance system because it had more than one surveillance activity or component in 2008: passive surveillance; pre-slaughter surveillance; and voluntary enhanced notifiable avian influenza surveillance. Our objectives were to give a short overview of two active surveillance components in CanNAISS; describe the CanNAISS scenario tree model and its application to estimation of probability of populations being free of NAI virus infection and sample size determination. Our data from the pre-slaughter surveillance component included diagnostic test results from 6296 serum samples representing 601 commercial chicken and turkey farms collected from 25 August 2008 to 29 January 2009. In addition, we included data from a sub-population of farms with high biosecurity standards: 36,164 samples from 55 farms sampled repeatedly over the 24 months study period from January 2007 to December 2008. All submissions were negative for Notifiable Avian Influenza (NAI) virus infection. We developed the CanNAISS scenario tree model, so that it will estimate the surveillance component sensitivity and the probability of a population being free of NAI at the 0.01 farm-level and 0.3 within-farm-level prevalences. We propose that a general model, such as the CanNAISS scenario tree model, may have a broader application than more detailed models that require disease specific input parameters, such as relative risk estimates. Crown Copyright © 2011. Published by Elsevier B.V. All rights reserved.

  10. Maximum likelihood estimation for Cox's regression model under nested case-control sampling.

    PubMed

    Scheike, Thomas H; Juul, Anders

    2004-04-01

    Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates.

  11. Partitioning Detectability Components in Populations Subject to Within-Season Temporary Emigration Using Binomial Mixture Models

    PubMed Central

    O’Donnell, Katherine M.; Thompson, Frank R.; Semlitsch, Raymond D.

    2015-01-01

    Detectability of individual animals is highly variable and nearly always < 1; imperfect detection must be accounted for to reliably estimate population sizes and trends. Hierarchical models can simultaneously estimate abundance and effective detection probability, but there are several different mechanisms that cause variation in detectability. Neglecting temporary emigration can lead to biased population estimates because availability and conditional detection probability are confounded. In this study, we extend previous hierarchical binomial mixture models to account for multiple sources of variation in detectability. The state process of the hierarchical model describes ecological mechanisms that generate spatial and temporal patterns in abundance, while the observation model accounts for the imperfect nature of counting individuals due to temporary emigration and false absences. We illustrate our model’s potential advantages, including the allowance of temporary emigration between sampling periods, with a case study of southern red-backed salamanders Plethodon serratus. We fit our model and a standard binomial mixture model to counts of terrestrial salamanders surveyed at 40 sites during 3–5 surveys each spring and fall 2010–2012. Our models generated similar parameter estimates to standard binomial mixture models. Aspect was the best predictor of salamander abundance in our case study; abundance increased as aspect became more northeasterly. Increased time-since-rainfall strongly decreased salamander surface activity (i.e. availability for sampling), while higher amounts of woody cover objects and rocks increased conditional detection probability (i.e. probability of capture, given an animal is exposed to sampling). By explicitly accounting for both components of detectability, we increased congruence between our statistical modeling and our ecological understanding of the system. We stress the importance of choosing survey locations and protocols that maximize species availability and conditional detection probability to increase population parameter estimate reliability. PMID:25775182

  12. Development of a modeling approach to estimate indoor-to-outdoor sulfur ratios and predict indoor PM2.5 and black carbon concentrations for Eastern Massachusetts households

    PubMed Central

    Tang, Chia Hsi; Garshick, Eric; Grady, Stephanie; Coull, Brent; Schwartz, Joel; Koutrakis, Petros

    2018-01-01

    The effects of indoor air pollution on human health have drawn increasing attention among the scientific community as individuals spend most of their time indoors. However, indoor air sampling is labor-intensive and costly, which limits the ability to study the adverse health effects related to indoor air pollutants. To overcome this challenge, many researchers have attempted to predict indoor exposures based on outdoor pollutant concentrations, home characteristics, and weather parameters. Typically, these models require knowledge of the infiltration factor, which indicates the fraction of ambient particles that penetrates indoors. For estimating indoor fine particulate matter (PM2.5) exposure, a common approach is to use the indoor-to-outdoor sulfur ratio (Sindoor/Soutdoor) as a proxy of the infiltration factor. The objective of this study was to develop a robust model that estimates Sindoor/Soutdoor for individual households that can be incorporated into models to predict indoor PM2.5 and black carbon (BC) concentrations. Overall, our model adequately estimated Sindoor/Soutdoor with an out-of-sample by home-season R2 of 0.89. Estimated Sindoor/Soutdoor reflected behaviors that influence particle infiltration, including window opening, use of forced air heating, and air purifier. Sulfur ratio-adjusted models predicted indoor PM2.5 and BC with high precision, with out-of-sample R2 values of 0.79 and 0.76, respectively. PMID:29064481

  13. Respondent-driven sampling as Markov chain Monte Carlo.

    PubMed

    Goel, Sharad; Salganik, Matthew J

    2009-07-30

    Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present RDS as Markov chain Monte Carlo importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which the sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating RDS studies.

  14. Approximation techniques for parameter estimation and feedback control for distributed models of large flexible structures

    NASA Technical Reports Server (NTRS)

    Banks, H. T.; Rosen, I. G.

    1984-01-01

    Approximation ideas are discussed that can be used in parameter estimation and feedback control for Euler-Bernoulli models of elastic systems. Focusing on parameter estimation problems, ways by which one can obtain convergence results for cubic spline based schemes for hybrid models involving an elastic cantilevered beam with tip mass and base acceleration are outlined. Sample numerical findings are also presented.

  15. Hierarchical models for estimating density from DNA mark-recapture studies

    USGS Publications Warehouse

    Gardner, B.; Royle, J. Andrew; Wegan, M.T.

    2009-01-01

    Genetic sampling is increasingly used as a tool by wildlife biologists and managers to estimate abundance and density of species. Typically, DNA is used to identify individuals captured in an array of traps ( e. g., baited hair snares) from which individual encounter histories are derived. Standard methods for estimating the size of a closed population can be applied to such data. However, due to the movement of individuals on and off the trapping array during sampling, the area over which individuals are exposed to trapping is unknown, and so obtaining unbiased estimates of density has proved difficult. We propose a hierarchical spatial capture-recapture model which contains explicit models for the spatial point process governing the distribution of individuals and their exposure to (via movement) and detection by traps. Detection probability is modeled as a function of each individual's distance to the trap. We applied this model to a black bear (Ursus americanus) study conducted in 2006 using a hair-snare trap array in the Adirondack region of New York, USA. We estimated the density of bears to be 0.159 bears/km2, which is lower than the estimated density (0.410 bears/km2) based on standard closed population techniques. A Bayesian analysis of the model is fully implemented in the software program WinBUGS.

  16. A model-based approach to sample size estimation in recent onset type 1 diabetes.

    PubMed

    Bundy, Brian N; Krischer, Jeffrey P

    2016-11-01

    The area under the curve C-peptide following a 2-h mixed meal tolerance test from 498 individuals enrolled on five prior TrialNet studies of recent onset type 1 diabetes from baseline to 12 months after enrolment were modelled to produce estimates of its rate of loss and variance. Age at diagnosis and baseline C-peptide were found to be significant predictors, and adjusting for these in an ANCOVA resulted in estimates with lower variance. Using these results as planning parameters for new studies results in a nearly 50% reduction in the target sample size. The modelling also produces an expected C-peptide that can be used in observed versus expected calculations to estimate the presumption of benefit in ongoing trials. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  17. A simulation study on Bayesian Ridge regression models for several collinearity levels

    NASA Astrophysics Data System (ADS)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

  18. Augmented switching linear dynamical system model for gas concentration estimation with MOX sensors in an open sampling system.

    PubMed

    Di Lello, Enrico; Trincavelli, Marco; Bruyninckx, Herman; De Laet, Tinne

    2014-07-11

    In this paper, we introduce a Bayesian time series model approach for gas concentration estimation using Metal Oxide (MOX) sensors in Open Sampling System (OSS). Our approach focuses on the compensation of the slow response of MOX sensors, while concurrently solving the problem of estimating the gas concentration in OSS. The proposed Augmented Switching Linear System model allows to include all the sources of uncertainty arising at each step of the problem in a single coherent probabilistic formulation. In particular, the problem of detecting on-line the current sensor dynamical regime and estimating the underlying gas concentration under environmental disturbances and noisy measurements is formulated and solved as a statistical inference problem. Our model improves, with respect to the state of the art, where system modeling approaches have been already introduced, but only provided an indirect relative measures proportional to the gas concentration and the problem of modeling uncertainty was ignored. Our approach is validated experimentally and the performances in terms of speed of and quality of the gas concentration estimation are compared with the ones obtained using a photo-ionization detector.

  19. Augmented Switching Linear Dynamical System Model for Gas Concentration Estimation with MOX Sensors in an Open Sampling System

    PubMed Central

    Di Lello, Enrico; Trincavelli, Marco; Bruyninckx, Herman; De Laet, Tinne

    2014-01-01

    In this paper, we introduce a Bayesian time series model approach for gas concentration estimation using Metal Oxide (MOX) sensors in Open Sampling System (OSS). Our approach focuses on the compensation of the slow response of MOX sensors, while concurrently solving the problem of estimating the gas concentration in OSS. The proposed Augmented Switching Linear System model allows to include all the sources of uncertainty arising at each step of the problem in a single coherent probabilistic formulation. In particular, the problem of detecting on-line the current sensor dynamical regime and estimating the underlying gas concentration under environmental disturbances and noisy measurements is formulated and solved as a statistical inference problem. Our model improves, with respect to the state of the art, where system modeling approaches have been already introduced, but only provided an indirect relative measures proportional to the gas concentration and the problem of modeling uncertainty was ignored. Our approach is validated experimentally and the performances in terms of speed of and quality of the gas concentration estimation are compared with the ones obtained using a photo-ionization detector. PMID:25019637

  20. Hankin and Reeves' approach to estimating fish abundance in small streams: Limitations and alternatives

    USGS Publications Warehouse

    Thompson, W.L.

    2003-01-01

    Hankin and Reeves' (1988) approach to estimating fish abundance in small streams has been applied in stream fish studies across North America. However, their population estimator relies on two key assumptions: (1) removal estimates are equal to the true numbers of fish, and (2) removal estimates are highly correlated with snorkel counts within a subset of sampled stream units. Violations of these assumptions may produce suspect results. To determine possible sources of the assumption violations, I used data on the abundance of steelhead Oncorhynchus mykiss from Hankin and Reeves' (1988) in a simulation composed of 50,000 repeated, stratified systematic random samples from a spatially clustered distribution. The simulation was used to investigate effects of a range of removal estimates, from 75% to 100% of true fish abundance, on overall stream fish population estimates. The effects of various categories of removal-estimates-to-snorkel-count correlation levels (r = 0.75-1.0) on fish population estimates were also explored. Simulation results indicated that Hankin and Reeves' approach may produce poor results unless removal estimates exceed at least 85% of the true number of fish within sampled units and unless correlations between removal estimates and snorkel counts are at least 0.90. A potential modification to Hankin and Reeves' approach is the inclusion of environmental covariates that affect detection rates of fish into the removal model or other mark-recapture model. A potential alternative approach is to use snorkeling combined with line transect sampling to estimate fish densities within stream units. As with any method of population estimation, a pilot study should be conducted to evaluate its usefulness, which requires a known (or nearly so) population of fish to serve as a benchmark for evaluating bias and precision of estimators.

  1. Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.; Thompson, Vanessa M.

    2011-01-01

    A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…

  2. Estimation of the Young's modulus of the human pars tensa using in-situ pressurization and inverse finite-element analysis.

    PubMed

    Rohani, S Alireza; Ghomashchi, Soroush; Agrawal, Sumit K; Ladak, Hanif M

    2017-03-01

    Finite-element models of the tympanic membrane are sensitive to the Young's modulus of the pars tensa. The aim of this work is to estimate the Young's modulus under a different experimental paradigm than currently used on the human tympanic membrane. These additional values could potentially be used by the auditory biomechanics community for building consensus. The Young's modulus of the human pars tensa was estimated through inverse finite-element modelling of an in-situ pressurization experiment. The experiments were performed on three specimens with a custom-built pressurization unit at a quasi-static pressure of 500 Pa. The shape of each tympanic membrane before and after pressurization was recorded using a Fourier transform profilometer. The samples were also imaged using micro-computed tomography to create sample-specific finite-element models. For each sample, the Young's modulus was then estimated by numerically optimizing its value in the finite-element model so simulated pressurized shapes matched experimental data. The estimated Young's modulus values were 2.2 MPa, 2.4 MPa and 2.0 MPa, and are similar to estimates obtained using in-situ single-point indentation testing. The estimates were obtained under the assumptions that the pars tensa is linearly elastic, uniform, isotropic with a thickness of 110 μm, and the estimates are limited to quasi-static loading. Estimates of pars tensa Young's modulus are sensitive to its thickness and inclusion of the manubrial fold. However, they do not appear to be sensitive to optimization initialization, height measurement error, pars flaccida Young's modulus, and tympanic membrane element type (shell versus solid). Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Improved variance estimation of classification performance via reduction of bias caused by small sample size.

    PubMed

    Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders

    2006-03-13

    Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.

  4. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    NASA Astrophysics Data System (ADS)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  5. Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2017-01-01

    In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…

  6. A statistical evaluation of non-ergodic variogram estimators

    USGS Publications Warehouse

    Curriero, F.C.; Hohn, M.E.; Liebhold, A.M.; Lele, S.R.

    2002-01-01

    Geostatistics is a set of statistical techniques that is increasingly used to characterize spatial dependence in spatially referenced ecological data. A common feature of geostatistics is predicting values at unsampled locations from nearby samples using the kriging algorithm. Modeling spatial dependence in sampled data is necessary before kriging and is usually accomplished with the variogram and its traditional estimator. Other types of estimators, known as non-ergodic estimators, have been used in ecological applications. Non-ergodic estimators were originally suggested as a method of choice when sampled data are preferentially located and exhibit a skewed frequency distribution. Preferentially located samples can occur, for example, when areas with high values are sampled more intensely than other areas. In earlier studies the visual appearance of variograms from traditional and non-ergodic estimators were compared. Here we evaluate the estimators' relative performance in prediction. We also show algebraically that a non-ergodic version of the variogram is equivalent to the traditional variogram estimator. Simulations, designed to investigate the effects of data skewness and preferential sampling on variogram estimation and kriging, showed the traditional variogram estimator outperforms the non-ergodic estimators under these conditions. We also analyzed data on carabid beetle abundance, which exhibited large-scale spatial variability (trend) and a skewed frequency distribution. Detrending data followed by robust estimation of the residual variogram is demonstrated to be a successful alternative to the non-ergodic approach.

  7. Estimating the price elasticity of beer: meta-analysis of data with heterogeneity, dependence, and publication bias.

    PubMed

    Nelson, Jon P

    2014-01-01

    Precise estimates of price elasticities are important for alcohol tax policy. Using meta-analysis, this paper corrects average beer elasticities for heterogeneity, dependence, and publication selection bias. A sample of 191 estimates is obtained from 114 primary studies. Simple and weighted means are reported. Dependence is addressed by restricting number of estimates per study, author-restricted samples, and author-specific variables. Publication bias is addressed using funnel graph, trim-and-fill, and Egger's intercept model. Heterogeneity and selection bias are examined jointly in meta-regressions containing moderator variables for econometric methodology, primary data, and precision of estimates. Results for fixed- and random-effects regressions are reported. Country-specific effects and sample time periods are unimportant, but several methodology variables help explain the dispersion of estimates. In models that correct for selection bias and heterogeneity, the average beer price elasticity is about -0.20, which is less elastic by 50% compared to values commonly used in alcohol tax policy simulations. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. PERIOD ESTIMATION FOR SPARSELY SAMPLED QUASI-PERIODIC LIGHT CURVES APPLIED TO MIRAS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Shiyuan; Huang, Jianhua Z.; Long, James

    2016-12-01

    We develop a nonlinear semi-parametric Gaussian process model to estimate periods of Miras with sparsely sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in the model with respect to a suitable prior distribution obtained from earlier studies. Since the likelihood is highly multimodal for period, we implement a hybrid method that applies the quasi-Newton algorithm for Gaussian process parameters and search the period/frequencymore » parameter space over a dense grid. A large-scale, high-fidelity simulation is conducted to mimic the sampling quality of Mira light curves obtained by the M33 Synoptic Stellar Survey. The simulated data set is publicly available and can serve as a testbed for future evaluation of different period estimation methods. The semi-parametric model outperforms an existing algorithm on this simulated test data set as measured by period recovery rate and quality of the resulting period–luminosity relations.« less

  9. Mathematical estimation of the level of microbial contamination on spacecraft surfaces by volumetric air sampling

    NASA Technical Reports Server (NTRS)

    Oxborrow, G. S.; Roark, A. L.; Fields, N. D.; Puleo, J. R.

    1974-01-01

    Microbiological sampling methods presently used for enumeration of microorganisms on spacecraft surfaces require contact with easily damaged components. Estimation of viable particles on surfaces using air sampling methods in conjunction with a mathematical model would be desirable. Parameters necessary for the mathematical model are the effect of angled surfaces on viable particle collection and the number of viable cells per viable particle. Deposition of viable particles on angled surfaces closely followed a cosine function, and the number of viable cells per viable particle was consistent with a Poisson distribution. Other parameters considered by the mathematical model included deposition rate and fractional removal per unit time. A close nonlinear correlation between volumetric air sampling and airborne fallout on surfaces was established with all fallout data points falling within the 95% confidence limits as determined by the mathematical model.

  10. Capture-Recapture Estimators in Epidemiology with Applications to Pertussis and Pneumococcal Invasive Disease Surveillance

    PubMed Central

    Braeye, Toon; Verheagen, Jan; Mignon, Annick; Flipse, Wim; Pierard, Denis; Huygen, Kris; Schirvel, Carole; Hens, Niel

    2016-01-01

    Introduction Surveillance networks are often not exhaustive nor completely complementary. In such situations, capture-recapture methods can be used for incidence estimation. The choice of estimator and their robustness with respect to the homogeneity and independence assumptions are however not well documented. Methods We investigated the performance of five different capture-recapture estimators in a simulation study. Eight different scenarios were used to detect and combine case-information. The scenarios increasingly violated assumptions of independence of samples and homogeneity of detection probabilities. Belgian datasets on invasive pneumococcal disease (IPD) and pertussis provided motivating examples. Results No estimator was unbiased in all scenarios. Performance of the parametric estimators depended on how much of the dependency and heterogeneity were correctly modelled. Model building was limited by parameter estimability, availability of additional information (e.g. covariates) and the possibilities inherent to the method. In the most complex scenario, methods that allowed for detection probabilities conditional on previous detections estimated the total population size within a 20–30% error-range. Parametric estimators remained stable if individual data sources lost up to 50% of their data. The investigated non-parametric methods were more susceptible to data loss and their performance was linked to the dependence between samples; overestimating in scenarios with little dependence, underestimating in others. Issues with parameter estimability made it impossible to model all suggested relations between samples for the IPD and pertussis datasets. For IPD, the estimates for the Belgian incidence for cases aged 50 years and older ranged from 44 to58/100,000 in 2010. The estimates for pertussis (all ages, Belgium, 2014) ranged from 24.2 to30.8/100,000. Conclusion We encourage the use of capture-recapture methods, but epidemiologists should preferably include datasets for which the underlying dependency structure is not too complex, a priori investigate this structure, compensate for it within the model and interpret the results with the remaining unmodelled heterogeneity in mind. PMID:27529167

  11. Density of American black bears in New Mexico

    USGS Publications Warehouse

    Gould, Matthew J.; Cain, James W.; Roemer, Gary W.; Gould, William R.; Liley, Stewart

    2018-01-01

    Considering advances in noninvasive genetic sampling and spatially explicit capture–recapture (SECR) models, the New Mexico Department of Game and Fish sought to update their density estimates for American black bear (Ursus americanus) populations in New Mexico, USA, to aide in setting sustainable harvest limits. We estimated black bear density in the Sangre de Cristo, Sandia, and Sacramento Mountains, New Mexico, 2012–2014. We collected hair samples from black bears using hair traps and bear rubs and used a sex marker and a suite of microsatellite loci to individually genotype hair samples. We then estimated density in a SECR framework using sex, elevation, land cover type, and time to model heterogeneity in detection probability and the spatial scale over which detection probability declines. We sampled the populations using 554 hair traps and 117 bear rubs and collected 4,083 hair samples. We identified 725 (367 male, 358 female) individuals. Our density estimates varied from 16.5 bears/100 km2 (95% CI = 11.6–23.5) in the southern Sacramento Mountains to 25.7 bears/100 km2 (95% CI = 13.2–50.1) in the Sandia Mountains. Overall, detection probability at the activity center (g0) was low across all study areas and ranged from 0.00001 to 0.02. The low values of g0 were primarily a result of half of all hair samples for which genotypes were attempted failing to produce a complete genotype. We speculate that the low success we had genotyping hair samples was due to exceedingly high levels of ultraviolet (UV) radiation that degraded the DNA in the hair. Despite sampling difficulties, we were able to produce density estimates with levels of precision comparable to those estimated for black bears elsewhere in the United States.

  12. Change-in-ratio methods for estimating population size

    USGS Publications Warehouse

    Udevitz, Mark S.; Pollock, Kenneth H.; McCullough, Dale R.; Barrett, Reginald H.

    2002-01-01

    Change-in-ratio (CIR) methods can provide an effective, low cost approach for estimating the size of wildlife populations. They rely on being able to observe changes in proportions of population subclasses that result from the removal of a known number of individuals from the population. These methods were first introduced in the 1940’s to estimate the size of populations with 2 subclasses under the assumption of equal subclass encounter probabilities. Over the next 40 years, closed population CIR models were developed to consider additional subclasses and use additional sampling periods. Models with assumptions about how encounter probabilities vary over time, rather than between subclasses, also received some attention. Recently, all of these CIR models have been shown to be special cases of a more general model. Under the general model, information from additional samples can be used to test assumptions about the encounter probabilities and to provide estimates of subclass sizes under relaxations of these assumptions. These developments have greatly extended the applicability of the methods. CIR methods are attractive because they do not require the marking of individuals, and subclass proportions often can be estimated with relatively simple sampling procedures. However, CIR methods require a carefully monitored removal of individuals from the population, and the estimates will be of poor quality unless the removals induce substantial changes in subclass proportions. In this paper, we review the state of the art for closed population estimation with CIR methods. Our emphasis is on the assumptions of CIR methods and on identifying situations where these methods are likely to be effective. We also identify some important areas for future CIR research.

  13. Modeling misidentification errors that result from use of genetic tags in capture-recapture studies

    USGS Publications Warehouse

    Yoshizaki, J.; Brownie, C.; Pollock, K.H.; Link, W.A.

    2011-01-01

    Misidentification of animals is potentially important when naturally existing features (natural tags) such as DNA fingerprints (genetic tags) are used to identify individual animals. For example, when misidentification leads to multiple identities being assigned to an animal, traditional estimators tend to overestimate population size. Accounting for misidentification in capture-recapture models requires detailed understanding of the mechanism. Using genetic tags as an example, we outline a framework for modeling the effect of misidentification in closed population studies when individual identification is based on natural tags that are consistent over time (non-evolving natural tags). We first assume a single sample is obtained per animal for each capture event, and then generalize to the case where multiple samples (such as hair or scat samples) are collected per animal per capture occasion. We introduce methods for estimating population size and, using a simulation study, we show that our new estimators perform well for cases with moderately high capture probabilities or high misidentification rates. In contrast, conventional estimators can seriously overestimate population size when errors due to misidentification are ignored. ?? 2009 Springer Science+Business Media, LLC.

  14. Railroads and the Environment : Estimation of Fuel Consumption in Rail Transportation : Volume 1. Analytical Model

    DOT National Transportation Integrated Search

    1975-05-01

    The report describes an analytical approach to estimation of fuel consumption in rail transportation, and provides sample computer calculations suggesting the sensitivity of fuel usage to various parameters. The model used is based upon careful delin...

  15. IMPROVED DERIVATION OF INPUT FUNCTION IN DYNAMIC MOUSE [18F]FDG PET USING BLADDER RADIOACTIVITY KINETICS

    PubMed Central

    Wong, Koon-Pong; Zhang, Xiaoli; Huang, Sung-Cheng

    2013-01-01

    Purpose Accurate determination of the plasma input function (IF) is essential for absolute quantification of physiological parameters in positron emission tomography (PET). However, it requires an invasive and tedious procedure of arterial blood sampling that is challenging in mice because of the limited blood volume. In this study, a hybrid modeling approach is proposed to estimate the plasma IF of 2-deoxy-2-[18F]fluoro-D-glucose ([18F]FDG) in mice using accumulated radioactivity in urinary bladder together with a single late-time blood sample measurement. Methods Dynamic PET scans were performed on nine isoflurane-anesthetized male C57BL/6 mice after a bolus injection of [18F]FDG at the lateral caudal vein. During a 60- or 90-min scan, serial blood samples were taken from the femoral artery. Image data were reconstructed using filtered backprojection with CT-based attenuation correction. Total accumulated radioactivity in the urinary bladder was fitted to a renal compartmental model with the last blood sample and a 1-exponential function that described the [18F]FDG clearance in blood. Multiple late-time blood sample estimates were calculated by the blood [18F]FDG clearance equation. A sum of 4-exponentials was assumed for the plasma IF that served as a forcing function to all tissues. The estimated plasma IF was obtained by simultaneously fitting the [18F]FDG model to the time-activity curves (TACs) of liver and muscle and the forcing function to early (0–1 min) left-ventricle data (corrected for delay, dispersion, partial-volume effects and erythrocytes uptake) and the late-time blood estimates. Using only the blood sample acquired at the end of the study to estimate the IF and the use of liver TAC as an alternative IF were also investigated. Results The area under the plasma TACs calculated for all studies using the hybrid approach was not significantly different from that using all blood samples. [18F]FDG uptake constants in brain, myocardium, skeletal muscle and liver computed by the Patlak analysis using estimated and measured plasma TACs were in excellent agreement (slope ~ 1; R2 > 0.938). The IF estimated using only the last blood sample acquired at the end of the study and the use of liver TAC as plasma IF provided less reliable results. Conclusions The estimated plasma IFs obtained with the hybrid model agreed well with those derived from arterial blood sampling. Importantly, the proposed method obviates the need of arterial catheterization, making it possible to perform repeated dynamic [18F]FDG PET studies on the same animal. Liver TAC is unsuitable as an input function for absolute quantification of [18F]FDG PET data. PMID:23322346

  16. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  17. Bayesian evidence computation for model selection in non-linear geoacoustic inference problems.

    PubMed

    Dettmer, Jan; Dosso, Stan E; Osler, John C

    2010-12-01

    This paper applies a general Bayesian inference approach, based on Bayesian evidence computation, to geoacoustic inversion of interface-wave dispersion data. Quantitative model selection is carried out by computing the evidence (normalizing constants) for several model parameterizations using annealed importance sampling. The resulting posterior probability density estimate is compared to estimates obtained from Metropolis-Hastings sampling to ensure consistent results. The approach is applied to invert interface-wave dispersion data collected on the Scotian Shelf, off the east coast of Canada for the sediment shear-wave velocity profile. Results are consistent with previous work on these data but extend the analysis to a rigorous approach including model selection and uncertainty analysis. The results are also consistent with core samples and seismic reflection measurements carried out in the area.

  18. Estimation of stature from sternum - Exploring the quadratic models.

    PubMed

    Saraf, Ashish; Kanchan, Tanuj; Krishan, Kewal; Ateriya, Navneet; Setia, Puneet

    2018-04-14

    Identification of the dead is significant in examination of unknown, decomposed and mutilated human remains. Establishing the biological profile is the central issue in such a scenario, and stature estimation remains one of the important criteria in this regard. The present study was undertaken to estimate stature from different parts of the sternum. A sample of 100 sterna was obtained from individuals during the medicolegal autopsies. Length of the deceased and various measurements of the sternum were measured. Student's t-test was performed to find the sex differences in stature and sternal measurements included in the study. Correlation between stature and sternal measurements were analysed using Karl Pearson's correlation, and linear and quadratic regression models were derived. All the measurements were found to be significantly larger in males than females. Stature correlated best with the combined length of sternum, among males (R = 0.894), females (R = 0.859), and for the total sample (R = 0.891). The study showed that the models derived for stature estimation from combined length of sternum are likely to give the most accurate estimates of stature in forensic case work when compared to manubrium and mesosternum. Accuracy of stature estimation further increased with quadratic models derived for the mesosternum among males and combined length of sternum among males and females when compared to linear regression models. Future studies in different geographical locations and a larger sample size are proposed to confirm the study observations. Copyright © 2018 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  19. Spatially explicit models for inference about density in unmarked or partially marked populations

    USGS Publications Warehouse

    Chandler, Richard B.; Royle, J. Andrew

    2013-01-01

    Recently developed spatial capture–recapture (SCR) models represent a major advance over traditional capture–recapture (CR) models because they yield explicit estimates of animal density instead of population size within an unknown area. Furthermore, unlike nonspatial CR methods, SCR models account for heterogeneity in capture probability arising from the juxtaposition of animal activity centers and sample locations. Although the utility of SCR methods is gaining recognition, the requirement that all individuals can be uniquely identified excludes their use in many contexts. In this paper, we develop models for situations in which individual recognition is not possible, thereby allowing SCR concepts to be applied in studies of unmarked or partially marked populations. The data required for our model are spatially referenced counts made on one or more sample occasions at a collection of closely spaced sample units such that individuals can be encountered at multiple locations. Our approach includes a spatial point process for the animal activity centers and uses the spatial correlation in counts as information about the number and location of the activity centers. Camera-traps, hair snares, track plates, sound recordings, and even point counts can yield spatially correlated count data, and thus our model is widely applicable. A simulation study demonstrated that while the posterior mean exhibits frequentist bias on the order of 5–10% in small samples, the posterior mode is an accurate point estimator as long as adequate spatial correlation is present. Marking a subset of the population substantially increases posterior precision and is recommended whenever possible. We applied our model to avian point count data collected on an unmarked population of the northern parula (Parula americana) and obtained a density estimate (posterior mode) of 0.38 (95% CI: 0.19–1.64) birds/ha. Our paper challenges sampling and analytical conventions in ecology by demonstrating that neither spatial independence nor individual recognition is needed to estimate population density—rather, spatial dependence can be informative about individual distribution and density.

  20. A performance model for GPUs with caches

    DOE PAGES

    Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...

    2014-06-24

    To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less

  1. Sampling Errors of SSM/I and TRMM Rainfall Averages: Comparison with Error Estimates from Surface Data and a Sample Model

    NASA Technical Reports Server (NTRS)

    Bell, Thomas L.; Kundu, Prasun K.; Kummerow, Christian D.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Quantitative use of satellite-derived maps of monthly rainfall requires some measure of the accuracy of the satellite estimates. The rainfall estimate for a given map grid box is subject to both remote-sensing error and, in the case of low-orbiting satellites, sampling error due to the limited number of observations of the grid box provided by the satellite. A simple model of rain behavior predicts that Root-mean-square (RMS) random error in grid-box averages should depend in a simple way on the local average rain rate, and the predicted behavior has been seen in simulations using surface rain-gauge and radar data. This relationship was examined using satellite SSM/I data obtained over the western equatorial Pacific during TOGA COARE. RMS error inferred directly from SSM/I rainfall estimates was found to be larger than predicted from surface data, and to depend less on local rain rate than was predicted. Preliminary examination of TRMM microwave estimates shows better agreement with surface data. A simple method of estimating rms error in satellite rainfall estimates is suggested, based on quantities that can be directly computed from the satellite data.

  2. Night sampling improves indices used for management of yellow perch in Lake Erie

    USGS Publications Warehouse

    Kocovsky, P.M.; Stapanian, M.A.; Knight, C.T.

    2010-01-01

    Catch rate (catch per hour) was examined for age-0 and age-1 yellow perch, Perca flavescens (Mitchill), captured in bottom trawls from 1991 to 2005 in western Lake Erie: (1) to examine variation of catch rate among years, seasons, diel periods and their interactions; and (2) to determine whether sampling during particular diel periods improved the management value of CPH data used in models to project abundance of age-2 yellow perch. Catch rate varied with year, season and the diel period during which sampling was conducted as well as by the interaction between year and season. Indices of abundance of age-0 and age-1 yellow perch estimated from night samples typically produced better fitting models and lower estimates of age-2 abundance than those using morning or afternoon samples, whereas indices using afternoon samples typically produced less precise and higher estimates of abundance. The diel period during which sampling is conducted will not affect observed population trends but may affect estimates of abundance of age-0 and age-1 yellow perch, which in turn affect recommended allowable harvest. A field experiment throughout western Lake Erie is recommended to examine potential benefits of night sampling to management of yellow perch. Published 2010. The article is a US Government work and is in the public domain in the USA.

  3. Elimination Rates of Dioxin Congeners in Former Chlorophenol Workers from Midland, Michigan

    PubMed Central

    Collins, James J.; Bodner, Kenneth M.; Wilken, Michael; Bodnar, Catherine M.

    2012-01-01

    Background: Exposure reconstructions and risk assessments for 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) and other dioxins rely on estimates of elimination rates. Limited data are available on elimination rates for congeners other than TCDD. Objectives: We estimated apparent elimination rates using a simple first-order one-compartment model for selected dioxin congeners based on repeated blood sampling in a previously studied population. Methods: Blood samples collected from 56 former chlorophenol workers in 2004–2005 and again in 2010 were analyzed for dioxin congeners. We calculated the apparent elimination half-life in each individual for each dioxin congener and examined factors potentially influencing elimination rates and the impact of estimated ongoing background exposures on rate estimates. Results: Mean concentrations of all dioxin congeners in the sampled participants declined between sampling times. Median apparent half-lives of elimination based on changes in estimated mass in the body were generally consistent with previous estimates and ranged from 6.8 years (1,2,3,7,8,9-hexachlorodibenzo-p-dioxin) to 11.6 years (pentachlorodibenzo-p-dioxin), with a composite half-life of 9.3 years for TCDD toxic equivalents. None of the factors examined, including age, smoking status, body mass index or change in body mass index, initial measured concentration, or chloracne diagnosis, was consistently associated with the estimated elimination rates in this population. Inclusion of plausible estimates of ongoing background exposures decreased apparent half-lives by approximately 10%. Available concentration-dependent toxicokinetic models for TCDD underpredicted observed elimination rates for concentrations < 100 ppt. Conclusions: The estimated elimination rates from this relatively large serial sampling study can inform occupational and environmental exposure and serum evaluations for dioxin compounds. PMID:23063871

  4. Travel cost demand model based river recreation benefit estimates with on-site and household surveys: Comparative results and a correction procedure

    NASA Astrophysics Data System (ADS)

    Loomis, John

    2003-04-01

    Past recreation studies have noted that on-site or visitor intercept surveys are subject to over-sampling of avid users (i.e., endogenous stratification) and have offered econometric solutions to correct for this. However, past papers do not estimate the empirical magnitude of the bias in benefit estimates with a real data set, nor do they compare the corrected estimates to benefit estimates derived from a population sample. This paper empirically examines the magnitude of the recreation benefits per trip bias by comparing estimates from an on-site river visitor intercept survey to a household survey. The difference in average benefits is quite large, with the on-site visitor survey yielding 24 per day trip, while the household survey yields 9.67 per day trip. A simple econometric correction for endogenous stratification in our count data model lowers the benefit estimate to $9.60 per day trip, a mean value nearly identical and not statistically different from the household survey estimate.

  5. Robust estimation for partially linear models with large-dimensional covariates

    PubMed Central

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2014-01-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087

  6. Robust estimation for partially linear models with large-dimensional covariates.

    PubMed

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2013-10-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.

  7. Modeling trends from North American Breeding Bird Survey data: a spatially explicit approach

    USGS Publications Warehouse

    Bled, Florent; Sauer, John R.; Pardieck, Keith L.; Doherty, Paul; Royle, J. Andy

    2013-01-01

    Population trends, defined as interval-specific proportional changes in population size, are often used to help identify species of conservation interest. Efficient modeling of such trends depends on the consideration of the correlation of population changes with key spatial and environmental covariates. This can provide insights into causal mechanisms and allow spatially explicit summaries at scales that are of interest to management agencies. We expand the hierarchical modeling framework used in the North American Breeding Bird Survey (BBS) by developing a spatially explicit model of temporal trend using a conditional autoregressive (CAR) model. By adopting a formal spatial model for abundance, we produce spatially explicit abundance and trend estimates. Analyses based on large-scale geographic strata such as Bird Conservation Regions (BCR) can suffer from basic imbalances in spatial sampling. Our approach addresses this issue by providing an explicit weighting based on the fundamental sample allocation unit of the BBS. We applied the spatial model to three species from the BBS. Species have been chosen based upon their well-known population change patterns, which allows us to evaluate the quality of our model and the biological meaning of our estimates. We also compare our results with the ones obtained for BCRs using a nonspatial hierarchical model (Sauer and Link 2011). Globally, estimates for mean trends are consistent between the two approaches but spatial estimates provide much more precise trend estimates in regions on the edges of species ranges that were poorly estimated in non-spatial analyses. Incorporating a spatial component in the analysis not only allows us to obtain relevant and biologically meaningful estimates for population trends, but also enables us to provide a flexible framework in order to obtain trend estimates for any area.

  8. Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals.

    PubMed

    Engemann, Denis A; Gramfort, Alexandre

    2015-03-01

    Magnetoencephalography and electroencephalography (M/EEG) measure non-invasively the weak electromagnetic fields induced by post-synaptic neural currents. The estimation of the spatial covariance of the signals recorded on M/EEG sensors is a building block of modern data analysis pipelines. Such covariance estimates are used in brain-computer interfaces (BCI) systems, in nearly all source localization methods for spatial whitening as well as for data covariance estimation in beamformers. The rationale for such models is that the signals can be modeled by a zero mean Gaussian distribution. While maximizing the Gaussian likelihood seems natural, it leads to a covariance estimate known as empirical covariance (EC). It turns out that the EC is a poor estimate of the true covariance when the number of samples is small. To address this issue the estimation needs to be regularized. The most common approach downweights off-diagonal coefficients, while more advanced regularization methods are based on shrinkage techniques or generative models with low rank assumptions: probabilistic PCA (PPCA) and factor analysis (FA). Using cross-validation all of these models can be tuned and compared based on Gaussian likelihood computed on unseen data. We investigated these models on simulations, one electroencephalography (EEG) dataset as well as magnetoencephalography (MEG) datasets from the most common MEG systems. First, our results demonstrate that different models can be the best, depending on the number of samples, heterogeneity of sensor types and noise properties. Second, we show that the models tuned by cross-validation are superior to models with hand-selected regularization. Hence, we propose an automated solution to the often overlooked problem of covariance estimation of M/EEG signals. The relevance of the procedure is demonstrated here for spatial whitening and source localization of MEG signals. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. The Model Human Processor and the Older Adult: Parameter Estimation and Validation within a Mobile Phone Task

    ERIC Educational Resources Information Center

    Jastrzembski, Tiffany S.; Charness, Neil

    2007-01-01

    The authors estimate weighted mean values for nine information processing parameters for older adults using the Card, Moran, and Newell (1983) Model Human Processor model. The authors validate a subset of these parameters by modeling two mobile phone tasks using two different phones and comparing model predictions to a sample of younger (N = 20;…

  10. Uncertainty in Population Growth Rates: Determining Confidence Intervals from Point Estimates of Parameters

    PubMed Central

    Devenish Nelson, Eleanor S.; Harris, Stephen; Soulsbury, Carl D.; Richards, Shane A.; Stephens, Philip A.

    2010-01-01

    Background Demographic models are widely used in conservation and management, and their parameterisation often relies on data collected for other purposes. When underlying data lack clear indications of associated uncertainty, modellers often fail to account for that uncertainty in model outputs, such as estimates of population growth. Methodology/Principal Findings We applied a likelihood approach to infer uncertainty retrospectively from point estimates of vital rates. Combining this with resampling techniques and projection modelling, we show that confidence intervals for population growth estimates are easy to derive. We used similar techniques to examine the effects of sample size on uncertainty. Our approach is illustrated using data on the red fox, Vulpes vulpes, a predator of ecological and cultural importance, and the most widespread extant terrestrial mammal. We show that uncertainty surrounding estimated population growth rates can be high, even for relatively well-studied populations. Halving that uncertainty typically requires a quadrupling of sampling effort. Conclusions/Significance Our results compel caution when comparing demographic trends between populations without accounting for uncertainty. Our methods will be widely applicable to demographic studies of many species. PMID:21049049

  11. Machine learning approaches for estimation of prediction interval for the model output.

    PubMed

    Shrestha, Durga L; Solomatine, Dimitri P

    2006-03-01

    A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.

  12. Robust geostatistical analysis of spatial data

    NASA Astrophysics Data System (ADS)

    Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.

    2013-04-01

    Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.

  13. Comparison of estimators of standard deviation for hydrologic time series

    USGS Publications Warehouse

    Tasker, Gary D.; Gilroy, Edward J.

    1982-01-01

    Unbiasing factors as a function of serial correlation, ρ, and sample size, n for the sample standard deviation of a lag one autoregressive model were generated by random number simulation. Monte Carlo experiments were used to compare the performance of several alternative methods for estimating the standard deviation σ of a lag one autoregressive model in terms of bias, root mean square error, probability of underestimation, and expected opportunity design loss. Three methods provided estimates of σ which were much less biased but had greater mean square errors than the usual estimate of σ: s = (1/(n - 1) ∑ (xi −x¯)2)½. The three methods may be briefly characterized as (1) a method using a maximum likelihood estimate of the unbiasing factor, (2) a method using an empirical Bayes estimate of the unbiasing factor, and (3) a robust nonparametric estimate of σ suggested by Quenouille. Because s tends to underestimate σ, its use as an estimate of a model parameter results in a tendency to underdesign. If underdesign losses are considered more serious than overdesign losses, then the choice of one of the less biased methods may be wise.

  14. Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution.

    PubMed

    Han, Fang; Liu, Han

    2017-02-01

    Correlation matrix plays a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson's sample correlation matrix. Although Pearson's sample correlation matrix enjoys various good properties under Gaussian models, its not an effective estimator when facing heavy-tail distributions with possible outliers. As a robust alternative, Han and Liu (2013b) advocated the use of a transformed version of the Kendall's tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall's tau sample correlation matrix and its transformed version proposed in Han and Liu (2013b) for estimating the population Kendall's tau correlation matrix and the latent Pearson's correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of "effective rank" in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a "sign subgaussian condition" which is sufficient to guarantee that the rank-based correlation matrix estimator attains the optimal rate of convergence. In both cases, we do not need any moment condition.

  15. Bayesian Estimation of the DINA Model with Gibbs Sampling

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew

    2015-01-01

    A Bayesian model formulation of the deterministic inputs, noisy "and" gate (DINA) model is presented. Gibbs sampling is employed to simulate from the joint posterior distribution of item guessing and slipping parameters, subject attribute parameters, and latent class probabilities. The procedure extends concepts in Béguin and Glas,…

  16. Analysis of spatial correlation in predictive models of forest variables that use LiDAR auxiliary information

    Treesearch

    F. Mauro; Vicente J. Monleon; H. Temesgen; L.A. Ruiz

    2017-01-01

    Accounting for spatial correlation of LiDAR model errors can improve the precision of model-based estimators. To estimate spatial correlation, sample designs that provide close observations are needed, but their implementation might be prohibitively expensive. To quantify the gains obtained by accounting for the spatial correlation of model errors, we examined (

  17. A call to improve methods for estimating tree biomass for regional and national assessments

    Treesearch

    Aaron R. Weiskittel; David W. MacFarlane; Philip J. Radtke; David L.R. Affleck; Hailemariam Temesgen; Christopher W. Woodall; James A. Westfall; John W. Coulston

    2015-01-01

    Tree biomass is typically estimated using statistical models. This review highlights five limitations of most tree biomass models, which include the following: (1) biomass data are costly to collect and alternative sampling methods are used; (2) belowground data and models are generally lacking; (3) models are often developed from small and geographically limited data...

  18. The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

    ERIC Educational Resources Information Center

    McNeish, Daniel M.; Stapleton, Laura M.

    2016-01-01

    Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the…

  19. Generalized Likelihood Uncertainty Estimation (GLUE) Using Multi-Optimization Algorithm as Sampling Method

    NASA Astrophysics Data System (ADS)

    Wang, Z.

    2015-12-01

    For decades, distributed and lumped hydrological models have furthered our understanding of hydrological system. The development of hydrological simulation in large scale and high precision elaborated the spatial descriptions and hydrological behaviors. Meanwhile, the new trend is also followed by the increment of model complexity and number of parameters, which brings new challenges of uncertainty quantification. Generalized Likelihood Uncertainty Estimation (GLUE) has been widely used in uncertainty analysis for hydrological models referring to Monte Carlo method coupled with Bayesian estimation. However, the stochastic sampling method of prior parameters adopted by GLUE appears inefficient, especially in high dimensional parameter space. The heuristic optimization algorithms utilizing iterative evolution show better convergence speed and optimality-searching performance. In light of the features of heuristic optimization algorithms, this study adopted genetic algorithm, differential evolution, shuffled complex evolving algorithm to search the parameter space and obtain the parameter sets of large likelihoods. Based on the multi-algorithm sampling, hydrological model uncertainty analysis is conducted by the typical GLUE framework. To demonstrate the superiority of the new method, two hydrological models of different complexity are examined. The results shows the adaptive method tends to be efficient in sampling and effective in uncertainty analysis, providing an alternative path for uncertainty quantilization.

  20. Estimating site occupancy and abundance using indirect detection indices

    USGS Publications Warehouse

    Stanley, T.R.; Royle, J. Andrew

    2005-01-01

    Knowledge of factors influencing animal distribution and abundance is essential in many areas of ecological research, management, and policy-making. Because common methods for modeling and estimating abundance (e.g., capture-recapture, distance sampling) are sometimes not practical for large areas or elusive species, indices are sometimes used as surrogate measures of abundance. We present an extension of the Royle and Nichols (2003) generalization of the MacKenzie et al. (2002) site-occupancy model that incorporates length of the sampling interval into the, model for detection probability. As a result, we obtain a modeling framework that shows how useful information can be extracted from a class of index methods we call indirect detection indices (IDIs). Examples of IDIs include scent station, tracking tube, snow track, tracking plate, and hair snare surveys. Our model is maximum likelihood, and it can be used to estimate site occupancy and model factors influencing patterns of occupancy and abundance in space. Under certain circumstances, it can also be used to estimate abundance. We evaluated model properties using Monte Carlo simulations and illustrate the method with tracking tube and scent station data. We believe this model will be a useful tool for determining factors that influence animal distribution and abundance.

  1. Study of the influence of the parameters of an experiment on the simulation of pole figures of polycrystalline materials using electron microscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Antonova, A. O., E-mail: aoantonova@mail.ru; Savyolova, T. I.

    2016-05-15

    A two-dimensional mathematical model of a polycrystalline sample and an experiment on electron backscattering diffraction (EBSD) is considered. The measurement parameters are taken to be the scanning step and threshold grain-boundary angle. Discrete pole figures for materials with hexagonal symmetry have been calculated based on the results of the model experiment. Discrete and smoothed (by the kernel method) pole figures of the model sample and the samples in the model experiment are compared using homogeneity criterion χ{sup 2}, an estimate of the pole figure maximum and its coordinate, a deviation of the pole figures of the model in the experimentmore » from the sample in the space of L{sub 1} measurable functions, and the RP-criterion for estimating the pole figure errors. Is is shown that the problem of calculating pole figures is ill-posed and their determination with respect to measurement parameters is not reliable.« less

  2. Hybrid Cubature Kalman filtering for identifying nonlinear models from sampled recording: Estimation of neuronal dynamics.

    PubMed

    Madi, Mahmoud K; Karameh, Fadi N

    2017-01-01

    Kalman filtering methods have long been regarded as efficient adaptive Bayesian techniques for estimating hidden states in models of linear dynamical systems under Gaussian uncertainty. Recent advents of the Cubature Kalman filter (CKF) have extended this efficient estimation property to nonlinear systems, and also to hybrid nonlinear problems where by the processes are continuous and the observations are discrete (continuous-discrete CD-CKF). Employing CKF techniques, therefore, carries high promise for modeling many biological phenomena where the underlying processes exhibit inherently nonlinear, continuous, and noisy dynamics and the associated measurements are uncertain and time-sampled. This paper investigates the performance of cubature filtering (CKF and CD-CKF) in two flagship problems arising in the field of neuroscience upon relating brain functionality to aggregate neurophysiological recordings: (i) estimation of the firing dynamics and the neural circuit model parameters from electric potentials (EP) observations, and (ii) estimation of the hemodynamic model parameters and the underlying neural drive from BOLD (fMRI) signals. First, in simulated neural circuit models, estimation accuracy was investigated under varying levels of observation noise (SNR), process noise structures, and observation sampling intervals (dt). When compared to the CKF, the CD-CKF consistently exhibited better accuracy for a given SNR, sharp accuracy increase with higher SNR, and persistent error reduction with smaller dt. Remarkably, CD-CKF accuracy shows only a mild deterioration for non-Gaussian process noise, specifically with Poisson noise, a commonly assumed form of background fluctuations in neuronal systems. Second, in simulated hemodynamic models, parametric estimates were consistently improved under CD-CKF. Critically, time-localization of the underlying neural drive, a determinant factor in fMRI-based functional connectivity studies, was significantly more accurate under CD-CKF. In conclusion, and with the CKF recently benchmarked against other advanced Bayesian techniques, the CD-CKF framework could provide significant gains in robustness and accuracy when estimating a variety of biological phenomena models where the underlying process dynamics unfold at time scales faster than those seen in collected measurements.

  3. Hybrid Cubature Kalman filtering for identifying nonlinear models from sampled recording: Estimation of neuronal dynamics

    PubMed Central

    2017-01-01

    Kalman filtering methods have long been regarded as efficient adaptive Bayesian techniques for estimating hidden states in models of linear dynamical systems under Gaussian uncertainty. Recent advents of the Cubature Kalman filter (CKF) have extended this efficient estimation property to nonlinear systems, and also to hybrid nonlinear problems where by the processes are continuous and the observations are discrete (continuous-discrete CD-CKF). Employing CKF techniques, therefore, carries high promise for modeling many biological phenomena where the underlying processes exhibit inherently nonlinear, continuous, and noisy dynamics and the associated measurements are uncertain and time-sampled. This paper investigates the performance of cubature filtering (CKF and CD-CKF) in two flagship problems arising in the field of neuroscience upon relating brain functionality to aggregate neurophysiological recordings: (i) estimation of the firing dynamics and the neural circuit model parameters from electric potentials (EP) observations, and (ii) estimation of the hemodynamic model parameters and the underlying neural drive from BOLD (fMRI) signals. First, in simulated neural circuit models, estimation accuracy was investigated under varying levels of observation noise (SNR), process noise structures, and observation sampling intervals (dt). When compared to the CKF, the CD-CKF consistently exhibited better accuracy for a given SNR, sharp accuracy increase with higher SNR, and persistent error reduction with smaller dt. Remarkably, CD-CKF accuracy shows only a mild deterioration for non-Gaussian process noise, specifically with Poisson noise, a commonly assumed form of background fluctuations in neuronal systems. Second, in simulated hemodynamic models, parametric estimates were consistently improved under CD-CKF. Critically, time-localization of the underlying neural drive, a determinant factor in fMRI-based functional connectivity studies, was significantly more accurate under CD-CKF. In conclusion, and with the CKF recently benchmarked against other advanced Bayesian techniques, the CD-CKF framework could provide significant gains in robustness and accuracy when estimating a variety of biological phenomena models where the underlying process dynamics unfold at time scales faster than those seen in collected measurements. PMID:28727850

  4. A land-use regression model for estimating microenvironmental diesel exposure given multiple addresses from birth through childhood.

    PubMed

    Ryan, Patrick H; Lemasters, Grace K; Levin, Linda; Burkle, Jeff; Biswas, Pratim; Hu, Shaohua; Grinshpun, Sergey; Reponen, Tiina

    2008-10-01

    The Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS) is a prospective birth cohort whose purpose is to determine if exposure to high levels of diesel exhaust particles (DEP) during early childhood increases the risk for developing allergic diseases. In order to estimate exposure to DEP, a land-use regression (LUR) model was developed using geographic data as independent variables and sampled levels of a marker of DEP as the dependent variable. A continuous wind direction variable was also created. The LUR model predicted 74% of the variability in sampled values with four variables: wind direction, length of bus routes within 300 m of the sample site, a measure of truck intensity within 300 m of the sampling site, and elevation. The LUR model was subsequently applied to all locations where the child had spent more than eight hours per week from through age three. A time-weighted average (TWA) microenvironmental exposure estimate was derived for four time periods: 0-6 months, 7-12 months, 13-24 months, 25-36 months. By age two, one third of the children were spending significant time at locations other than home and by 36 months, 39% of the children had changed their residential addresses. The mean cumulative DEP exposure estimate increased from age 6 to 36 months from 70 to 414 microg/m3-days. Findings indicate that using birth addresses to estimate a child's exposure may result in exposure misclassification for some children who spend a significant amount of time at a location with high exposure to DEP.

  5. Inverse sampling regression for pooled data.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Eskridge, Kent; Crossa, José

    2017-06-01

    Because pools are tested instead of individuals in group testing, this technique is helpful for estimating prevalence in a population or for classifying a large number of individuals into two groups at a low cost. For this reason, group testing is a well-known means of saving costs and producing precise estimates. In this paper, we developed a mixed-effect group testing regression that is useful when the data-collecting process is performed using inverse sampling. This model allows including covariate information at the individual level to incorporate heterogeneity among individuals and identify which covariates are associated with positive individuals. We present an approach to fit this model using maximum likelihood and we performed a simulation study to evaluate the quality of the estimates. Based on the simulation study, we found that the proposed regression method for inverse sampling with group testing produces parameter estimates with low bias when the pre-specified number of positive pools (r) to stop the sampling process is at least 10 and the number of clusters in the sample is also at least 10. We performed an application with real data and we provide an NLMIXED code that researchers can use to implement this method.

  6. Vertical distribution of fish biomass in Lake Superior: Implications for day bottom trawl surveys

    USGS Publications Warehouse

    Stockwell, J.D.; Yule, D.L.; Hrabik, T.R.; Adams, J.V.; Gorman, O.T.; Holbrook, B.V.

    2007-01-01

    Evaluation of the biases in sampling methodology is essential for understanding the limitations of abundance and biomass estimates of fish populations. Estimates from surveys that rely solely on bottom trawls may be particularly vulnerable to bias if pelagic fish are numerous. We evaluated the variability in the vertical distribution of fish biomass during the U.S. Geological Survey's annual spring bottom trawl survey of Lake Superior using concurrent hydroacoustic observations to (1) test the assumption that fish are generally demersal during the day and (2) evaluate the potential for predictive models to improve bottom trawl–determined biomass estimates. Our results indicate that the assumption that fish exhibit demersal behavior during the annual spring bottom trawl survey in Lake Superior is unfounded. Bottom trawl biomass (BBT) estimates (mean ± SE) for species known to exhibit pelagic behavior (cisco Coregonus artedi, bloater C. hoyi, kiyi C. kiyi, and rainbow smelt Osmerus mordax; 3.01 ± 0.73 kg/ha) were not significantly greater than mean acoustic pelagic zone biomass (BAPZ) estimates (6.39 ± 2.03 kg/ha). Mean BAPZ estimates were 1.6- to 4.8-fold greater than mean BBT estimates over 4 years of sampling. The relationship between concurrent BAPZ and BBT estimates was marginally significant and highly variable. Predicted BAPZ estimates using cross-validation models were sensitive to adjustments for back-transforming from the logarithmic to the linear scale and poorly corresponded to observed BAPZ estimates. We conclude that statistical models to predict BAPZ from day BBT cannot be developed. We propose that night sampling with multiple gears will be necessary to generate better biomass estimates for management needs.

  7. A rapid assessment method to estimate the distribution of juvenile Chinook Salmon in tributary habitats using eDNA and occupancy estimation

    USGS Publications Warehouse

    Matter, A.; Falke, Jeffrey A.; López, J. Andres; Savereide, James W.

    2018-01-01

    Identification and protection of water bodies used by anadromous species are critical in light of increasing threats to fish populations, yet often challenging given budgetary and logistical limitations. Noninvasive, rapid‐assessment, sampling techniques may reduce costs and effort while increasing species detection efficiencies. We used an intrinsic potential (IP) habitat model to identify high‐quality rearing habitats for Chinook Salmon Oncorhynchus tshawytscha and select sites to sample throughout the Chena River basin, Alaska, for juvenile occupancy using an environmental DNA (eDNA) approach. Water samples were collected from 75 tributary sites in 2014 and 2015. The presence of Chinook Salmon DNA in water samples was assessed using a species‐specific quantitative PCR (qPCR) assay. The IP model predicted over 900 stream kilometers in the basin to support high‐quality (IP ≥ 0.75) rearing habitat. Occupancy estimation based on eDNA samples indicated that 80% and 56% of previously unsampled sites classified as high or low IP (IP < 0.75), respectively, were occupied. The probability of detection (p) of Chinook Salmon DNA from three replicate water samples was high (p = 0.76) but varied with drainage area (km2). A power analysis indicated high power to detect proportional changes in occupancy based on parameter values estimated from eDNA occupancy models, although power curves were not symmetrical around zero, indicating greater power to detect positive than negative proportional changes in occupancy. Overall, the combination of IP habitat modeling and occupancy estimation provided a useful, rapid‐assessment method to predict and subsequently quantify the distribution of juvenile salmon in previously unsampled tributary habitats. Additionally, these methods are flexible and can be modified for application to other species and in other locations, which may contribute towards improved population monitoring and management.

  8. Evaluating performance of stormwater sampling approaches using a dynamic watershed model.

    PubMed

    Ackerman, Drew; Stein, Eric D; Ritter, Kerry J

    2011-09-01

    Accurate quantification of stormwater pollutant levels is essential for estimating overall contaminant discharge to receiving waters. Numerous sampling approaches exist that attempt to balance accuracy against the costs associated with the sampling method. This study employs a novel and practical approach of evaluating the accuracy of different stormwater monitoring methodologies using stormflows and constituent concentrations produced by a fully validated continuous simulation watershed model. A major advantage of using a watershed model to simulate pollutant concentrations is that a large number of storms representing a broad range of conditions can be applied in testing the various sampling approaches. Seventy-eight distinct methodologies were evaluated by "virtual samplings" of 166 simulated storms of varying size, intensity and duration, representing 14 years of storms in Ballona Creek near Los Angeles, California. The 78 methods can be grouped into four general strategies: volume-paced compositing, time-paced compositing, pollutograph sampling, and microsampling. The performances of each sampling strategy was evaluated by comparing the (1) median relative error between the virtually sampled and the true modeled event mean concentration (EMC) of each storm (accuracy), (2) median absolute deviation about the median or "MAD" of the relative error or (precision), and (3) the percentage of storms where sampling methods were within 10% of the true EMC (combined measures of accuracy and precision). Finally, costs associated with site setup, sampling, and laboratory analysis were estimated for each method. Pollutograph sampling consistently outperformed the other three methods both in terms of accuracy and precision, but was the most costly method evaluated. Time-paced sampling consistently underestimated while volume-paced sampling over estimated the storm EMCs. Microsampling performance approached that of pollutograph sampling at a substantial cost savings. The most efficient method for routine stormwater monitoring in terms of a balance between performance and cost was volume-paced microsampling, with variable sample pacing to ensure that the entirety of the storm was captured. Pollutograph sampling is recommended if the data are to be used for detailed analysis of runoff dynamics.

  9. Estimating leaf nitrogen accumulation in maize based on canopy hyperspectrum data

    NASA Astrophysics Data System (ADS)

    Gu, Xiaohe; Wang, Lizhi; Song, Xiaoyu; Xu, Xingang

    2016-10-01

    Leaf nitrogen accumulation (LNA) has important influence on the formation of crop yield and grain protein. Monitoring leaf nitrogen accumulation of crop canopy quantitively and real-timely is helpful for mastering crop nutrition status, diagnosing group growth and managing fertilization precisely. The study aimed to develop a universal method to monitor LNA of maize by hyperspectrum data, which could provide mechanism support for mapping LNA of maize at county scale. The correlations between LNA and hyperspectrum reflectivity and its mathematical transformations were analyzed. Then the feature bands and its transformations were screened to develop the optimal model of estimating LNA based on multiple linear regression method. The in-situ samples were used to evaluate the accuracy of the estimating model. Results showed that the estimating model with one differential logarithmic transformation (lgP') of reflectivity could reach highest correlation coefficient (0.889) with lowest RMSE (0.646 g·m-2), which was considered as the optimal model for estimating LNA in maize. The determination coefficient (R2) of testing samples was 0.831, while the RMSE was 1.901 g·m-2. It indicated that the one differential logarithmic transformation of hyperspectrum had good response with LNA of maize. Based on this transformation, the optimal estimating model of LNA could reach good accuracy with high stability.

  10. Advantage of population pharmacokinetic method for evaluating the bioequivalence and accuracy of parameter estimation of pidotimod.

    PubMed

    Huang, Jihan; Li, Mengying; Lv, Yinghua; Yang, Juan; Xu, Ling; Wang, Jingjing; Chen, Junchao; Wang, Kun; He, Yingchun; Zheng, Qingshan

    2016-09-01

    This study was aimed at exploring the accuracy of population pharmacokinetic method in evaluating the bioequivalence of pidotimod with sparse data profiles and whether this method is suitable for bioequivalence evaluation in special populations such as children with fewer samplings. Methods In this single-dose, two-period crossover study, 20 healthy male Chinese volunteers were randomized 1 : 1 to receive either the test or reference formulation, with a 1-week washout before receiving the alternative formulation. Noncompartmental and population compartmental pharmacokinetic analyses were conducted. Simulated data were analyzed to graphically evaluate the model and the pharmacokinetic characteristics of the two pidotimod formulations. Various sparse sampling scenarios were generated from the real bioequivalence clinical trial data and evaluated by population pharmacokinetic method. The 90% confidence intervals (CIs) for AUC0-12h, AUC0-∞, and Cmax were 97.3 - 118.7%, 96.9 - 118.7%, and 95.1 - 109.8%, respectively, within the 80 - 125% range for bioequivalence using noncompartmental analysis. The population compartmental pharmacokinetics of pidotimod were described using a one-compartment model with first-order absorption and lag time. In the comparison of estimations in different dataset, the estimation of random three- and< fixed four-point sampling strategies can provide results similar to those obtained through rich sampling. The nonlinear mixed-effects model requires fewer data points. Moreover, compared with the noncompartmental analysis method, the pharmacokinetic parameters can be more accurately estimated using nonlinear mixed-effects model. The population pharmacokinetic modeling method was used to assess the bioequivalence of two pidotimod formulations with relatively few sampling points and further validated the bioequivalence of the two formulations. This method may provide useful information for regulating bioequivalence evaluation in special populations.

  11. Local Intrinsic Dimension Estimation by Generalized Linear Modeling.

    PubMed

    Hino, Hideitsu; Fujiki, Jun; Akaho, Shotaro; Murata, Noboru

    2017-07-01

    We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.

  12. An employee total health management-based survey of Iowa employers.

    PubMed

    Merchant, James A; Lind, David P; Kelly, Kevin M; Hall, Jennifer L

    2013-12-01

    To implement an Employee Total Health Management (ETHM) model-based questionnaire and provide estimates of model program elements among a statewide sample of Iowa employers. Survey a stratified random sample of Iowa employers, and characterize and estimate employer participation in ETHM program elements. Iowa employers are implementing less than 30% of all 12 components of ETHM, with the exception of occupational safety and health (46.6%) and workers' compensation insurance coverage (89.2%), but intend modest expansion of all components in the coming year. The ETHM questionnaire-based survey provides estimates of progress Iowa employers are making toward implementing components of Total Worker Health programs.

  13. Does Bootstrap Procedure Provide Biased Estimates? An Empirical Examination for a Case of Multiple Regression.

    ERIC Educational Resources Information Center

    Fan, Xitao

    This paper empirically and systematically assessed the performance of bootstrap resampling procedure as it was applied to a regression model. Parameter estimates from Monte Carlo experiments (repeated sampling from population) and bootstrap experiments (repeated resampling from one original bootstrap sample) were generated and compared. Sample…

  14. Optimizing occupational exposure measurement strategies when estimating the log-scale arithmetic mean value--an example from the reinforced plastics industry.

    PubMed

    Lampa, Erik G; Nilsson, Leif; Liljelind, Ingrid E; Bergdahl, Ingvar A

    2006-06-01

    When assessing occupational exposures, repeated measurements are in most cases required. Repeated measurements are more resource intensive than a single measurement, so careful planning of the measurement strategy is necessary to assure that resources are spent wisely. The optimal strategy depends on the objectives of the measurements. Here, two different models of random effects analysis of variance (ANOVA) are proposed for the optimization of measurement strategies by the minimization of the variance of the estimated log-transformed arithmetic mean value of a worker group, i.e. the strategies are optimized for precise estimation of that value. The first model is a one-way random effects ANOVA model. For that model it is shown that the best precision in the estimated mean value is always obtained by including as many workers as possible in the sample while restricting the number of replicates to two or at most three regardless of the size of the variance components. The second model introduces the 'shared temporal variation' which accounts for those random temporal fluctuations of the exposure that the workers have in common. It is shown for that model that the optimal sample allocation depends on the relative sizes of the between-worker component and the shared temporal component, so that if the between-worker component is larger than the shared temporal component more workers should be included in the sample and vice versa. The results are illustrated graphically with an example from the reinforced plastics industry. If there exists a shared temporal variation at a workplace, that variability needs to be accounted for in the sampling design and the more complex model is recommended.

  15. Statistical aspects of point count sampling

    USGS Publications Warehouse

    Barker, R.J.; Sauer, J.R.; Ralph, C.J.; Sauer, J.R.; Droege, S.

    1995-01-01

    The dominant feature of point counts is that they do not census birds, but instead provide incomplete counts of individuals present within a survey plot. Considering a simple model for point count sampling, we demon-strate that use of these incomplete counts can bias estimators and testing procedures, leading to inappropriate conclusions. A large portion of the variability in point counts is caused by the incomplete counting, and this within-count variation can be confounded with ecologically meaningful varia-tion. We recommend caution in the analysis of estimates obtained from point counts. Using; our model, we also consider optimal allocation of sampling effort. The critical step in the optimization process is in determining the goals of the study and methods that will be used to meet these goals. By explicitly defining the constraints on sampling and by estimating the relationship between precision and bias of estimators and time spent counting, we can predict the optimal time at a point for each of several monitoring goals. In general, time spent at a point will differ depending on the goals of the study.

  16. Estimation of a partially linear additive model for data from an outcome-dependent sampling design with a continuous outcome

    PubMed Central

    Tan, Ziwen; Qin, Guoyou; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) designs have been well recognized as a cost-effective way to enhance study efficiency in both statistical literature and biomedical and epidemiologic studies. A partially linear additive model (PLAM) is widely applied in real problems because it allows for a flexible specification of the dependence of the response on some covariates in a linear fashion and other covariates in a nonlinear non-parametric fashion. Motivated by an epidemiological study investigating the effect of prenatal polychlorinated biphenyls exposure on children's intelligence quotient (IQ) at age 7 years, we propose a PLAM in this article to investigate a more flexible non-parametric inference on the relationships among the response and covariates under the ODS scheme. We propose the estimation method and establish the asymptotic properties of the proposed estimator. Simulation studies are conducted to show the improved efficiency of the proposed ODS estimator for PLAM compared with that from a traditional simple random sampling design with the same sample size. The data of the above-mentioned study is analyzed to illustrate the proposed method. PMID:27006375

  17. Estimating residual fault hitting rates by recapture sampling

    NASA Technical Reports Server (NTRS)

    Lee, Larry; Gupta, Rajan

    1988-01-01

    For the recapture debugging design introduced by Nayak (1988) the problem of estimating the hitting rates of the faults remaining in the system is considered. In the context of a conditional likelihood, moment estimators are derived and are shown to be asymptotically normal and fully efficient. Fixed sample properties of the moment estimators are compared, through simulation, with those of the conditional maximum likelihood estimators. Properties of the conditional model are investigated such as the asymptotic distribution of linear functions of the fault hitting frequencies and a representation of the full data vector in terms of a sequence of independent random vectors. It is assumed that the residual hitting rates follow a log linear rate model and that the testing process is truncated when the gaps between the detection of new errors exceed a fixed amount of time.

  18. Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA

    NASA Astrophysics Data System (ADS)

    Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz

    2018-04-01

    External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.

  19. Non-invasive genetic censusing and monitoring of primate populations.

    PubMed

    Arandjelovic, Mimi; Vigilant, Linda

    2018-03-01

    Knowing the density or abundance of primate populations is essential for their conservation management and contextualizing socio-demographic and behavioral observations. When direct counts of animals are not possible, genetic analysis of non-invasive samples collected from wildlife populations allows estimates of population size with higher accuracy and precision than is possible using indirect signs. Furthermore, in contrast to traditional indirect survey methods, prolonged or periodic genetic sampling across months or years enables inference of group membership, movement, dynamics, and some kin relationships. Data may also be used to estimate sex ratios, sex differences in dispersal distances, and detect gene flow among locations. Recent advances in capture-recapture models have further improved the precision of population estimates derived from non-invasive samples. Simulations using these methods have shown that the confidence interval of point estimates includes the true population size when assumptions of the models are met, and therefore this range of population size minima and maxima should be emphasized in population monitoring studies. Innovations such as the use of sniffer dogs or anti-poaching patrols for sample collection are important to ensure adequate sampling, and the expected development of efficient and cost-effective genotyping by sequencing methods for DNAs derived from non-invasive samples will automate and speed analyses. © 2018 Wiley Periodicals, Inc.

  20. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  1. Estimating rates of local species extinction, colonization and turnover in animal communities

    USGS Publications Warehouse

    Nichols, James D.; Boulinier, T.; Hines, J.E.; Pollock, K.H.; Sauer, J.R.

    1998-01-01

    Species richness has been identified as a useful state variable for conservation and management purposes. Changes in richness over time provide a basis for predicting and evaluating community responses to management, to natural disturbance, and to changes in factors such as community composition (e.g., the removal of a keystone species). Probabilistic capture-recapture models have been used recently to estimate species richness from species count and presence-absence data. These models do not require the common assumption that all species are detected in sampling efforts. We extend this approach to the development of estimators useful for studying the vital rates responsible for changes in animal communities over time; rates of local species extinction, turnover, and colonization. Our approach to estimation is based on capture-recapture models for closed animal populations that permit heterogeneity in detection probabilities among the different species in the sampled community. We have developed a computer program, COMDYN, to compute many of these estimators and associated bootstrap variances. Analyses using data from the North American Breeding Bird Survey (BBS) suggested that the estimators performed reasonably well. We recommend estimators based on probabilistic modeling for future work on community responses to management efforts as well as on basic questions about community dynamics.

  2. Generalized estimators of avian abundance from count survey data

    USGS Publications Warehouse

    Royle, J. Andrew

    2004-01-01

    I consider modeling avian abundance from spatially referenced bird count data collected according to common protocols such as capture?recapture, multiple observer, removal sampling and simple point counts. Small sample sizes and large numbers of parameters have motivated many analyses that disregard the spatial indexing of the data, and thus do not provide an adequate treatment of spatial structure. I describe a general framework for modeling spatially replicated data that regards local abundance as a random process, motivated by the view that the set of spatially referenced local populations (at the sample locations) constitute a metapopulation. Under this view, attention can be focused on developing a model for the variation in local abundance independent of the sampling protocol being considered. The metapopulation model structure, when combined with the data generating model, define a simple hierarchical model that can be analyzed using conventional methods. The proposed modeling framework is completely general in the sense that broad classes of metapopulation models may be considered, site level covariates on detection and abundance may be considered, and estimates of abundance and related quantities may be obtained for sample locations, groups of locations, unsampled locations. Two brief examples are given, the first involving simple point counts, and the second based on temporary removal counts. Extension of these models to open systems is briefly discussed.

  3. Investigations of potential bias in the estimation of lambda using Pradel's (1996) model for capture-recapture data

    USGS Publications Warehouse

    Hines, James E.; Nichols, James D.

    2002-01-01

    Pradel's (1996) temporal symmetry model permitting direct estimation and modelling of population growth rate, u i , provides a potentially useful tool for the study of population dynamics using marked animals. Because of its recent publication date, the approach has not seen much use, and there have been virtually no investigations directed at robustness of the resulting estimators. Here we consider several potential sources of bias, all motivated by specific uses of this estimation approach. We consider sampling situations in which the study area expands with time and present an analytic expression for the bias in u i We next consider trap response in capture probabilities and heterogeneous capture probabilities and compute large-sample and simulation-based approximations of resulting bias in u i . These approximations indicate that trap response is an especially important assumption violation that can produce substantial bias. Finally, we consider losses on capture and emphasize the importance of selecting the estimator for u i that is appropriate to the question being addressed. For studies based on only sighting and resighting data, Pradel's (1996) u i ' is the appropriate estimator.

  4. Performance of Bootstrapping Approaches To Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Nevitt, Jonathan; Hancock, Gregory R.

    2001-01-01

    Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…

  5. The effects of sampling frequency on the climate statistics of the European Centre for Medium-Range Weather Forecasts

    NASA Astrophysics Data System (ADS)

    Phillips, Thomas J.; Gates, W. Lawrence; Arpe, Klaus

    1992-12-01

    The effects of sampling frequency on the first- and second-moment statistics of selected European Centre for Medium-Range Weather Forecasts (ECMWF) model variables are investigated in a simulation of "perpetual July" with a diurnal cycle included and with surface and atmospheric fields saved at hourly intervals. The shortest characteristic time scales (as determined by the e-folding time of lagged autocorrelation functions) are those of ground heat fluxes and temperatures, precipitation and runoff, convective processes, cloud properties, and atmospheric vertical motion, while the longest time scales are exhibited by soil temperature and moisture, surface pressure, and atmospheric specific humidity, temperature, and wind. The time scales of surface heat and momentum fluxes and of convective processes are substantially shorter over land than over oceans. An appropriate sampling frequency for each model variable is obtained by comparing the estimates of first- and second-moment statistics determined at intervals ranging from 2 to 24 hours with the "best" estimates obtained from hourly sampling. Relatively accurate estimation of first- and second-moment climate statistics (10% errors in means, 20% errors in variances) can be achieved by sampling a model variable at intervals that usually are longer than the bandwidth of its time series but that often are shorter than its characteristic time scale. For the surface variables, sampling at intervals that are nonintegral divisors of a 24-hour day yields relatively more accurate time-mean statistics because of a reduction in errors associated with aliasing of the diurnal cycle and higher-frequency harmonics. The superior estimates of first-moment statistics are accompanied by inferior estimates of the variance of the daily means due to the presence of systematic biases, but these probably can be avoided by defining a different measure of low-frequency variability. Estimates of the intradiurnal variance of accumulated precipitation and surface runoff also are strongly impacted by the length of the storage interval. In light of these results, several alternative strategies for storage of the EMWF model variables are recommended.

  6. A limited sampling model for estimation of total and unbound mycophenolic acid (MPA) area under the curve (AUC) in hematopoietic cell transplantation (HCT).

    PubMed

    Ng, Juki; Rogosheske, John; Barker, Juliet; Weisdorf, Daniel; Jacobson, Pamala A

    2006-06-01

    Renal transplant patients with suboptimal mycophenolic acid (MPA) areas under the curves (AUCs) are at greater risk of acute rejection. In hematopoietic cell transplantation, a low MPA AUC is also associated with a higher incidence of acute graft versus host disease. Therefore, a limited sampling model was developed and validated to simultaneously estimate total and unbound MPA AUC0-12 in hematopoietic cell transplantation patients. Intensive pharmacokinetic sampling was performed at steady state between days 3 to 7 posttransplant in 73 adult subjects while receiving prophylactic mycophenolate mofetil 1 g per 12 hours orally or intravenously plus cyclosporine. Total and unbound MPA plasma concentrations were measured, and total and unbound AUC0-12 was determined using noncompartmental analysis. Regression analysis was then performed to build IV and PO, total and unbound AUC0-12 models from the first 34 subjects. The predictive performance of these models was tested in the next 39 subjects. Trough concentrations poorly estimate observed total and unbound AUC0-12 (r<0.48). A model with 3 concentrations (2-, 4-, and 6-hour post start of infusion) best estimated observed total and unbound AUC0-12 after IV dosing (r>0.99). Oral total and unbound AUC0-12 was more difficult to estimate and required at least 4 concentrations (0-, 1-, 2-, and 6-hour post dose) in the model (r>0.85). The predictive performance of the final models was good. Eighty-three percent of IV and 70% of PO AUC0-12 predictions fell within +/-20% of the observed values without significant bias. Trough MPA concentrations do not accurately describe MPA AUC0-12. Three intravenous (2-, 4-, 6-hour post start of infusion) or 4 oral (0-, 1-, 2-, and 6-hour post dose) MPA plasma concentrations measured over a 12-hour dosing interval will estimate the total and unbound AUC0-12 nearly as well as intensive pharmacokinetic sampling with good precision and low bias. This approach simplifies AUC0-12 targeting of MPA post hematopoietic cell transplantation.

  7. Evaluation of Bayesian estimation of a hidden continuous-time Markov chain model with application to threshold violation in water-quality indicators

    USGS Publications Warehouse

    Deviney, Frank A.; Rice, Karen; Brown, Donald E.

    2012-01-01

    Natural resource managers require information concerning  the frequency, duration, and long-term probability of occurrence of water-quality indicator (WQI) violations of defined thresholds. The timing of these threshold crossings often is hidden from the observer, who is restricted to relatively infrequent observations. Here, a model for the hidden process is linked with a model for the observations, and the parameters describing duration, return period, and long-term probability of occurrence are estimated using Bayesian methods. A simulation experiment is performed to evaluate the approach under scenarios based on the equivalent of a total monitoring period of 5-30 years and an observation frequency of 1-50 observations per year. Given constant threshold crossing rate, accuracy and precision of parameter estimates increased with longer total monitoring period and more-frequent observations. Given fixed monitoring period and observation frequency, accuracy and precision of parameter estimates increased with longer times between threshold crossings. For most cases where the long-term probability of being in violation is greater than 0.10, it was determined that at least 600 observations are needed to achieve precise estimates.  An application of the approach is presented using 22 years of quasi-weekly observations of acid-neutralizing capacity from Deep Run, a stream in Shenandoah National Park, Virginia. The time series also was sub-sampled to simulate monthly and semi-monthly sampling protocols. Estimates of the long-term probability of violation were unbiased despite sampling frequency; however, the expected duration and return period were over-estimated using the sub-sampled time series with respect to the full quasi-weekly time series.

  8. An evaluation of sex-age-kill (SAK) model performance

    USGS Publications Warehouse

    Millspaugh, Joshua J.; Skalski, John R.; Townsend, Richard L.; Diefenbach, Duane R.; Boyce, Mark S.; Hansen, Lonnie P.; Kammermeyer, Kent

    2009-01-01

    The sex-age-kill (SAK) model is widely used to estimate abundance of harvested large mammals, including white-tailed deer (Odocoileus virginianus). Despite a long history of use, few formal evaluations of SAK performance exist. We investigated how violations of the stable age distribution and stationary population assumption, changes to male or female harvest, stochastic effects (i.e., random fluctuations in recruitment and survival), and sampling efforts influenced SAK estimation. When the simulated population had a stable age distribution and λ > 1, the SAK model underestimated abundance. Conversely, when λ < 1, the SAK overestimated abundance. When changes to male harvest were introduced, SAK estimates were opposite the true population trend. In contrast, SAK estimates were robust to changes in female harvest rates. Stochastic effects caused SAK estimates to fluctuate about their equilibrium abundance, but the effect dampened as the size of the surveyed population increased. When we considered both stochastic effects and sampling error at a deer management unit scale the resultant abundance estimates were within ±121.9% of the true population level 95% of the time. These combined results demonstrate extreme sensitivity to model violations and scale of analysis. Without changes to model formulation, the SAK model will be biased when λ ≠ 1. Furthermore, any factor that alters the male harvest rate, such as changes to regulations or changes in hunter attitudes, will bias population estimates. Sex-age-kill estimates may be precise at large spatial scales, such as the state level, but less so at the individual management unit level. Alternative models, such as statistical age-at-harvest models, which require similar data types, might allow for more robust, broad-scale demographic assessments.

  9. Sampling considerations for disease surveillance in wildlife populations

    USGS Publications Warehouse

    Nusser, S.M.; Clark, W.R.; Otis, D.L.; Huang, L.

    2008-01-01

    Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.

  10. SU-E-I-46: Sample-Size Dependence of Model Observers for Estimating Low-Contrast Detection Performance From CT Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reiser, I; Lu, Z

    2014-06-01

    Purpose: Recently, task-based assessment of diagnostic CT systems has attracted much attention. Detection task performance can be estimated using human observers, or mathematical observer models. While most models are well established, considerable bias can be introduced when performance is estimated from a limited number of image samples. Thus, the purpose of this work was to assess the effect of sample size on bias and uncertainty of two channelized Hotelling observers and a template-matching observer. Methods: The image data used for this study consisted of 100 signal-present and 100 signal-absent regions-of-interest, which were extracted from CT slices. The experimental conditions includedmore » two signal sizes and five different x-ray beam current settings (mAs). Human observer performance for these images was determined in 2-alternative forced choice experiments. These data were provided by the Mayo clinic in Rochester, MN. Detection performance was estimated from three observer models, including channelized Hotelling observers (CHO) with Gabor or Laguerre-Gauss (LG) channels, and a template-matching observer (TM). Different sample sizes were generated by randomly selecting a subset of image pairs, (N=20,40,60,80). Observer performance was quantified as proportion of correct responses (PC). Bias was quantified as the relative difference of PC for 20 and 80 image pairs. Results: For n=100, all observer models predicted human performance across mAs and signal sizes. Bias was 23% for CHO (Gabor), 7% for CHO (LG), and 3% for TM. The relative standard deviation, σ(PC)/PC at N=20 was highest for the TM observer (11%) and lowest for the CHO (Gabor) observer (5%). Conclusion: In order to make image quality assessment feasible in the clinical practice, a statistically efficient observer model, that can predict performance from few samples, is needed. Our results identified two observer models that may be suited for this task.« less

  11. Relative Performance of Rescaling and Resampling Approaches to Model Chi Square and Parameter Standard Error Estimation in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Nevitt, Johnathan; Hancock, Gregory R.

    Though common structural equation modeling (SEM) methods are predicated upon the assumption of multivariate normality, applied researchers often find themselves with data clearly violating this assumption and without sufficient sample size to use distribution-free estimation methods. Fortunately, promising alternatives are being integrated into…

  12. An Improved Estimation Using Polya-Gamma Augmentation for Bayesian Structural Equation Models with Dichotomous Variables

    ERIC Educational Resources Information Center

    Kim, Seohyun; Lu, Zhenqiu; Cohen, Allan S.

    2018-01-01

    Bayesian algorithms have been used successfully in the social and behavioral sciences to analyze dichotomous data particularly with complex structural equation models. In this study, we investigate the use of the Polya-Gamma data augmentation method with Gibbs sampling to improve estimation of structural equation models with dichotomous variables.…

  13. Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

    ERIC Educational Resources Information Center

    Kieftenbeld, Vincent; Natesan, Prathiba

    2012-01-01

    Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…

  14. Estimating fluvial wood discharge from timelapse photography with varying sampling intervals

    NASA Astrophysics Data System (ADS)

    Anderson, N. K.

    2013-12-01

    There is recent focus on calculating wood budgets for streams and rivers to help inform management decisions, ecological studies and carbon/nutrient cycling models. Most work has measured in situ wood in temporary storage along stream banks or estimated wood inputs from banks. Little effort has been employed monitoring and quantifying wood in transport during high flows. This paper outlines a procedure for estimating total seasonal wood loads using non-continuous coarse interval sampling and examines differences in estimation between sampling at 1, 5, 10 and 15 minutes. Analysis is performed on wood transport for the Slave River in Northwest Territories, Canada. Relative to the 1 minute dataset, precision decreased by 23%, 46% and 60% for the 5, 10 and 15 minute datasets, respectively. Five and 10 minute sampling intervals provided unbiased equal variance estimates of 1 minute sampling, whereas 15 minute intervals were biased towards underestimation by 6%. Stratifying estimates by day and by discharge increased precision over non-stratification by 4% and 3%, respectively. Not including wood transported during ice break-up, the total minimum wood load estimated at this site is 3300 × 800$ m3 for the 2012 runoff season. The vast majority of the imprecision in total wood volumes came from variance in estimating average volume per log. Comparison of proportions and variance across sample intervals using bootstrap sampling to achieve equal n. Each trial was sampled for n=100, 10,000 times and averaged. All trials were then averaged to obtain an estimate for each sample interval. Dashed lines represent values from the one minute dataset.

  15. Evaluating abundance and trends in a Hawaiian avian community using state-space analysis

    USGS Publications Warehouse

    Camp, Richard J.; Brinck, Kevin W.; Gorresen, P.M.; Paxton, Eben H.

    2016-01-01

    Estimating population abundances and patterns of change over time are important in both ecology and conservation. Trend assessment typically entails fitting a regression to a time series of abundances to estimate population trajectory. However, changes in abundance estimates from year-to-year across time are due to both true variation in population size (process variation) and variation due to imperfect sampling and model fit. State-space models are a relatively new method that can be used to partition the error components and quantify trends based only on process variation. We compare a state-space modelling approach with a more traditional linear regression approach to assess trends in uncorrected raw counts and detection-corrected abundance estimates of forest birds at Hakalau Forest National Wildlife Refuge, Hawai‘i. Most species demonstrated similar trends using either method. In general, evidence for trends using state-space models was less strong than for linear regression, as measured by estimates of precision. However, while the state-space models may sacrifice precision, the expectation is that these estimates provide a better representation of the real world biological processes of interest because they are partitioning process variation (environmental and demographic variation) and observation variation (sampling and model variation). The state-space approach also provides annual estimates of abundance which can be used by managers to set conservation strategies, and can be linked to factors that vary by year, such as climate, to better understand processes that drive population trends.

  16. Space-Time Smoothing of Complex Survey Data: Small Area Estimation for Child Mortality.

    PubMed

    Mercer, Laina D; Wakefield, Jon; Pantazis, Athena; Lutambi, Angelina M; Masanja, Honorati; Clark, Samuel

    2015-12-01

    Many people living in low and middle-income countries are not covered by civil registration and vital statistics systems. Consequently, a wide variety of other types of data including many household sample surveys are used to estimate health and population indicators. In this paper we combine data from sample surveys and demographic surveillance systems to produce small area estimates of child mortality through time. Small area estimates are necessary to understand geographical heterogeneity in health indicators when full-coverage vital statistics are not available. For this endeavor spatio-temporal smoothing is beneficial to alleviate problems of data sparsity. The use of conventional hierarchical models requires careful thought since the survey weights may need to be considered to alleviate bias due to non-random sampling and non-response. The application that motivated this work is estimation of child mortality rates in five-year time intervals in regions of Tanzania. Data come from Demographic and Health Surveys conducted over the period 1991-2010 and two demographic surveillance system sites. We derive a variance estimator of under five years child mortality that accounts for the complex survey weighting. For our application, the hierarchical models we consider include random effects for area, time and survey and we compare models using a variety of measures including the conditional predictive ordinate (CPO). The method we propose is implemented via the fast and accurate integrated nested Laplace approximation (INLA).

  17. Monitoring bald eagles using lists of nests: Response to Watts and Duerr

    USGS Publications Warehouse

    Sauer, John R.; Otto, Mark C.; Kendall, William L.; Zimmerman, Guthrie S.

    2011-01-01

    The post-delisting monitoring plan for bald eagles (Haliaeetus leucocephalus) roposed use of a dual-frame sample design, in which sampling of known nest sites in combination with additional area-based sampling is used to estimate total number of nesting bald eagle pairs. Watts and Duerr (2010) used data from repeated observations of bald eagle nests in Virginia, USA to estimate a nest turnover rate and used this rate to simulate decline in number of occupied nests in list nests over time. Results of Watts and Duerr suggest that, given the rates of loss of nests from the list of known nest sites in Virginia, the list information will be of little value to sampling unless lists are constantly updated. Those authors criticize the plan for not placing sufficient emphasis on updating and maintaining lists of bald eagle nests. Watts and Duerr's metric of turnover rate does not distinguish detectability or temporary nonuse of nests from permanent loss of nests and likely overestimates turnover rate. We describe a multi-state capture–recapture model that allows appropriate estimation of rates of loss of nests, and we use the model to estimate rates of loss from a sample of nests from Maine, USA. The post-delisting monitoring plan addresses the need to maintain and update the lists of nests, and we show that dual frame sampling is an effective approach for sampling nesting bald eagle populations.

  18. SMALL AREA ESTIMATION OF INDICATORS OF STREAM CONDITION FOR MAIA USING HIERARCHICAL BAYES PREDICTION MODELS

    EPA Science Inventory

    Probability surveys of stream and river resources (hereafter referred to as streams) provide reliable estimates of stream condition when the areas for the estimates have sufficient number of sample sites. Monitoring programs are frequently asked to provide estimates for areas th...

  19. A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution.

    PubMed

    Harrison, Xavier A

    2015-01-01

    Overdispersion is a common feature of models of biological data, but researchers often fail to model the excess variation driving the overdispersion, resulting in biased parameter estimates and standard errors. Quantifying and modeling overdispersion when it is present is therefore critical for robust biological inference. One means to account for overdispersion is to add an observation-level random effect (OLRE) to a model, where each data point receives a unique level of a random effect that can absorb the extra-parametric variation in the data. Although some studies have investigated the utility of OLRE to model overdispersion in Poisson count data, studies doing so for Binomial proportion data are scarce. Here I use a simulation approach to investigate the ability of both OLRE models and Beta-Binomial models to recover unbiased parameter estimates in mixed effects models of Binomial data under various degrees of overdispersion. In addition, as ecologists often fit random intercept terms to models when the random effect sample size is low (<5 levels), I investigate the performance of both model types under a range of random effect sample sizes when overdispersion is present. Simulation results revealed that the efficacy of OLRE depends on the process that generated the overdispersion; OLRE failed to cope with overdispersion generated from a Beta-Binomial mixture model, leading to biased slope and intercept estimates, but performed well for overdispersion generated by adding random noise to the linear predictor. Comparison of parameter estimates from an OLRE model with those from its corresponding Beta-Binomial model readily identified when OLRE were performing poorly due to disagreement between effect sizes, and this strategy should be employed whenever OLRE are used for Binomial data to assess their reliability. Beta-Binomial models performed well across all contexts, but showed a tendency to underestimate effect sizes when modelling non-Beta-Binomial data. Finally, both OLRE and Beta-Binomial models performed poorly when models contained <5 levels of the random intercept term, especially for estimating variance components, and this effect appeared independent of total sample size. These results suggest that OLRE are a useful tool for modelling overdispersion in Binomial data, but that they do not perform well in all circumstances and researchers should take care to verify the robustness of parameter estimates of OLRE models.

  20. Uncertainty quantification of resonant ultrasound spectroscopy for material property and single crystal orientation estimation on a complex part

    NASA Astrophysics Data System (ADS)

    Aldrin, John C.; Mayes, Alexander; Jauriqui, Leanne; Biedermann, Eric; Heffernan, Julieanne; Livings, Richard; Goodlet, Brent; Mazdiyasni, Siamack

    2018-04-01

    A case study is presented evaluating uncertainty in Resonance Ultrasound Spectroscopy (RUS) inversion for a single crystal (SX) Ni-based superalloy Mar-M247 cylindrical dog-bone specimens. A number of surrogate models were developed with FEM model solutions, using different sampling schemes (regular grid, Monte Carlo sampling, Latin Hyper-cube sampling) and model approaches, N-dimensional cubic spline interpolation and Kriging. Repeated studies were used to quantify the well-posedness of the inversion problem, and the uncertainty was assessed in material property and crystallographic orientation estimates given typical geometric dimension variability in aerospace components. Surrogate model quality was found to be an important factor in inversion results when the model more closely represents the test data. One important discovery was when the model matches well with test data, a Kriging surrogate model using un-sorted Latin Hypercube sampled data performed as well as the best results from an N-dimensional interpolation model using sorted data. However, both surrogate model quality and mode sorting were found to be less critical when inverting properties from either experimental data or simulated test cases with uncontrolled geometric variation.

  1. Estimation of the Proportion of Underachieving Students in Compulsory Secondary Education in Spain: An Application of the Rasch Model

    PubMed Central

    Veas, Alejandro; Gilar, Raquel; Miñano, Pablo; Castejón, Juan-Luis

    2016-01-01

    There are very few studies in Spain that treat underachievement rigorously, and those that do are typically related to gifted students. The present study examined the proportion of underachieving students using the Rasch measurement model. A sample of 643 first-year high school students (mean age = 12.09; SD = 0.47) from 8 schools in the province of Alicante (Spain) completed the Battery of Differential and General Skills (Badyg), and these students' General Points Average (GPAs) were recovered by teachers. Dichotomous and Partial credit Rasch models were performed. After adjusting the measurement instruments, the individual underachievement index provided a total sample of 181 underachieving students, or 28.14% of the total sample across the ability levels. This study confirms that the Rasch measurement model can accurately estimate the construct validity of both the intelligence test and the academic grades for the calculation of underachieving students. Furthermore, the present study constitutes a pioneer framework for the estimation of the prevalence of underachievement in Spain. PMID:26973586

  2. Using a Modification of the Capture-Recapture Model To Estimate the Need for Substance Abuse Treatment.

    ERIC Educational Resources Information Center

    Maxwell, Jane Carlisle; Pullum, Thomas W.

    2001-01-01

    Applied the capture-recapture model, through a Poisson regression to a time series of data for admissions to treatment from 1987 to 1996 to estimate the number of heroin addicts in Texas who are "at-risk" for treatment. The entire data set produced estimates that were lower and more plausible than those produced by drawing samples,…

  3. Using an EM Covariance Matrix to Estimate Structural Equation Models with Missing Data: Choosing an Adjusted Sample Size to Improve the Accuracy of Inferences

    ERIC Educational Resources Information Center

    Enders, Craig K.; Peugh, James L.

    2004-01-01

    Two methods, direct maximum likelihood (ML) and the expectation maximization (EM) algorithm, can be used to obtain ML parameter estimates for structural equation models with missing data (MD). Although the 2 methods frequently produce identical parameter estimates, it may be easier to satisfy missing at random assumptions using EM. However, no…

  4. Comparing Different Approaches of Bias Correction for Ability Estimation in IRT Models. Research Report. ETS RR-08-13

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2008-01-01

    The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…

  5. Impact of hindcast length on estimates of seasonal climate predictability.

    PubMed

    Shi, W; Schaller, N; MacLeod, D; Palmer, T N; Weisheimer, A

    2015-03-16

    It has recently been argued that single-model seasonal forecast ensembles are overdispersive, implying that the real world is more predictable than indicated by estimates of so-called perfect model predictability, particularly over the North Atlantic. However, such estimates are based on relatively short forecast data sets comprising just 20 years of seasonal predictions. Here we study longer 40 year seasonal forecast data sets from multimodel seasonal forecast ensemble projects and show that sampling uncertainty due to the length of the hindcast periods is large. The skill of forecasting the North Atlantic Oscillation during winter varies within the 40 year data sets with high levels of skill found for some subperiods. It is demonstrated that while 20 year estimates of seasonal reliability can show evidence of overdispersive behavior, the 40 year estimates are more stable and show no evidence of overdispersion. Instead, the predominant feature on these longer time scales is underdispersion, particularly in the tropics. Predictions can appear overdispersive due to hindcast length sampling errorLonger hindcasts are more robust and underdispersive, especially in the tropicsTwenty hindcasts are an inadequate sample size to assess seasonal forecast skill.

  6. Reliability of environmental sampling culture results using the negative binomial intraclass correlation coefficient.

    PubMed

    Aly, Sharif S; Zhao, Jianyang; Li, Ben; Jiang, Jiming

    2014-01-01

    The Intraclass Correlation Coefficient (ICC) is commonly used to estimate the similarity between quantitative measures obtained from different sources. Overdispersed data is traditionally transformed so that linear mixed model (LMM) based ICC can be estimated. A common transformation used is the natural logarithm. The reliability of environmental sampling of fecal slurry on freestall pens has been estimated for Mycobacterium avium subsp. paratuberculosis using the natural logarithm transformed culture results. Recently, the negative binomial ICC was defined based on a generalized linear mixed model for negative binomial distributed data. The current study reports on the negative binomial ICC estimate which includes fixed effects using culture results of environmental samples. Simulations using a wide variety of inputs and negative binomial distribution parameters (r; p) showed better performance of the new negative binomial ICC compared to the ICC based on LMM even when negative binomial data was logarithm, and square root transformed. A second comparison that targeted a wider range of ICC values showed that the mean of estimated ICC closely approximated the true ICC.

  7. Spectral imaging using consumer-level devices and kernel-based regression.

    PubMed

    Heikkinen, Ville; Cámara, Clara; Hirvonen, Tapani; Penttinen, Niko

    2016-06-01

    Hyperspectral reflectance factor image estimations were performed in the 400-700 nm wavelength range using a portable consumer-level laptop display as an adjustable light source for a trichromatic camera. Targets of interest were ColorChecker Classic samples, Munsell Matte samples, geometrically challenging tempera icon paintings from the turn of the 20th century, and human hands. Measurements and simulations were performed using Nikon D80 RGB camera and Dell Vostro 2520 laptop screen as a light source. Estimations were performed without spectral characteristics of the devices and by emphasizing simplicity for training sets and estimation model optimization. Spectral and color error images are shown for the estimations using line-scanned hyperspectral images as the ground truth. Estimations were performed using kernel-based regression models via a first-degree inhomogeneous polynomial kernel and a Matérn kernel, where in the latter case the median heuristic approach for model optimization and link function for bounded estimation were evaluated. Results suggest modest requirements for a training set and show that all estimation models have markedly improved accuracy with respect to the DE00 color distance (up to 99% for paintings and hands) and the Pearson distance (up to 98% for paintings and 99% for hands) from a weak training set (Digital ColorChecker SG) case when small representative training data were used in the estimation.

  8. Age-specific survival of male golden-cheeked warblers on the Fort Hood Military Reservation, Texas

    USGS Publications Warehouse

    Duarte, Adam; Hines, James E.; Nichols, James D.; Hatfield, Jeffrey S.; Weckerly, Floyd W.

    2014-01-01

    Population models are essential components of large-scale conservation and management plans for the federally endangered Golden-cheeked Warbler (Setophaga chrysoparia; hereafter GCWA). However, existing models are based on vital rate estimates calculated using relatively small data sets that are now more than a decade old. We estimated more current, precise adult and juvenile apparent survival (Φ) probabilities and their associated variances for male GCWAs. In addition to providing estimates for use in population modeling, we tested hypotheses about spatial and temporal variation in Φ. We assessed whether a linear trend in Φ or a change in the overall mean Φ corresponded to an observed increase in GCWA abundance during 1992-2000 and if Φ varied among study plots. To accomplish these objectives, we analyzed long-term GCWA capture-resight data from 1992 through 2011, collected across seven study plots on the Fort Hood Military Reservation using a Cormack-Jolly-Seber model structure within program MARK. We also estimated Φ process and sampling variances using a variance-components approach. Our results did not provide evidence of site-specific variation in adult Φ on the installation. Because of a lack of data, we could not assess whether juvenile Φ varied spatially. We did not detect a strong temporal association between GCWA abundance and Φ. Mean estimates of Φ for adult and juvenile male GCWAs for all years analyzed were 0.47 with a process variance of 0.0120 and a sampling variance of 0.0113 and 0.28 with a process variance of 0.0076 and a sampling variance of 0.0149, respectively. Although juvenile Φ did not differ greatly from previous estimates, our adult Φ estimate suggests previous GCWA population models were overly optimistic with respect to adult survival. These updated Φ probabilities and their associated variances will be incorporated into new population models to assist with GCWA conservation decision making.

  9. Efficient Monte Carlo Estimation of the Expected Value of Sample Information Using Moment Matching.

    PubMed

    Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca

    2018-02-01

    The Expected Value of Sample Information (EVSI) is used to calculate the economic value of a new research strategy. Although this value would be important to both researchers and funders, there are very few practical applications of the EVSI. This is due to computational difficulties associated with calculating the EVSI in practical health economic models using nested simulations. We present an approximation method for the EVSI that is framed in a Bayesian setting and is based on estimating the distribution of the posterior mean of the incremental net benefit across all possible future samples, known as the distribution of the preposterior mean. Specifically, this distribution is estimated using moment matching coupled with simulations that are available for probabilistic sensitivity analysis, which is typically mandatory in health economic evaluations. This novel approximation method is applied to a health economic model that has previously been used to assess the performance of other EVSI estimators and accurately estimates the EVSI. The computational time for this method is competitive with other methods. We have developed a new calculation method for the EVSI which is computationally efficient and accurate. This novel method relies on some additional simulation so can be expensive in models with a large computational cost.

  10. State of charge monitoring of vanadium redox flow batteries using half cell potentials and electrolyte density

    NASA Astrophysics Data System (ADS)

    Ressel, Simon; Bill, Florian; Holtz, Lucas; Janshen, Niklas; Chica, Antonio; Flower, Thomas; Weidlich, Claudia; Struckmann, Thorsten

    2018-02-01

    The operation of vanadium redox flow batteries requires reliable in situ state of charge (SOC) monitoring. In this study, two SOC estimation approaches for the negative half cell are investigated. First, in situ open circuit potential measurements are combined with Coulomb counting in a one-step calibration of SOC and Nernst potential which doesn't need additional reference SOCs. In-sample and out-of-sample SOCs are estimated and analyzed, estimation errors ≤ 0.04 are obtained. In the second approach, temperature corrected in situ electrolyte density measurements are used for the first time in vanadium redox flow batteries for SOC estimation. In-sample and out-of-sample SOC estimation errors ≤ 0.04 demonstrate the feasibility of this approach. Both methods allow recalibration during battery operation. The actual capacity obtained from SOC calibration can be used in a state of health model.

  11. An integrated study of earth resources in the state of California using remote sensing techniques

    NASA Technical Reports Server (NTRS)

    Colwell, R. N. (Principal Investigator)

    1975-01-01

    The author has identified the following significant results. A weighted stratified double sample design using hardcopy LANDSAT-1 and ground data was utilized in developmental studies for snow water content estimation. Study results gave a correlation coefficient of 0.80 between LANDSAT sample units estimates of snow water content and ground subsamples. A basin snow water content estimate allowable error was given as 1.00 percent at the 99 percent confidence level with the same budget level utilized in conventional snow surveys. Several evapotranspiration estimation models were selected for efficient application at each level of data to be sampled. An area estimation procedure for impervious surface types of differing impermeability adjacent to stream channels was developed. This technique employs a double sample of 1:125,000 color infrared hightflight transparency data with ground or large scale photography.

  12. Estimating effect of environmental contaminants on women's subfecundity for the MoBa study data with an outcome-dependent sampling scheme

    PubMed Central

    Ding, Jieli; Zhou, Haibo; Liu, Yanyan; Cai, Jianwen; Longnecker, Matthew P.

    2014-01-01

    Motivated by the need from our on-going environmental study in the Norwegian Mother and Child Cohort (MoBa) study, we consider an outcome-dependent sampling (ODS) scheme for failure-time data with censoring. Like the case-cohort design, the ODS design enriches the observed sample by selectively including certain failure subjects. We present an estimated maximum semiparametric empirical likelihood estimation (EMSELE) under the proportional hazards model framework. The asymptotic properties of the proposed estimator were derived. Simulation studies were conducted to evaluate the small-sample performance of our proposed method. Our analyses show that the proposed estimator and design is more efficient than the current default approach and other competing approaches. Applying the proposed approach with the data set from the MoBa study, we found a significant effect of an environmental contaminant on fecundability. PMID:24812419

  13. Generalizing the Network Scale-Up Method: A New Estimator for the Size of Hidden Populations*

    PubMed Central

    Feehan, Dennis M.; Salganik, Matthew J.

    2018-01-01

    The network scale-up method enables researchers to estimate the size of hidden populations, such as drug injectors and sex workers, using sampled social network data. The basic scale-up estimator offers advantages over other size estimation techniques, but it depends on problematic modeling assumptions. We propose a new generalized scale-up estimator that can be used in settings with non-random social mixing and imperfect awareness about membership in the hidden population. Further, the new estimator can be used when data are collected via complex sample designs and from incomplete sampling frames. However, the generalized scale-up estimator also requires data from two samples: one from the frame population and one from the hidden population. In some situations these data from the hidden population can be collected by adding a small number of questions to already planned studies. For other situations, we develop interpretable adjustment factors that can be applied to the basic scale-up estimator. We conclude with practical recommendations for the design and analysis of future studies. PMID:29375167

  14. Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models

    ERIC Educational Resources Information Center

    Price, Larry R.

    2012-01-01

    The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…

  15. Comparison of methods for estimating the attributable risk in the context of survival analysis.

    PubMed

    Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M

    2017-01-23

    The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.

  16. Pairing field methods to improve inference in wildlife surveys while accommodating detection covariance.

    PubMed

    Clare, John; McKinney, Shawn T; DePue, John E; Loftin, Cynthia S

    2017-10-01

    It is common to use multiple field sampling methods when implementing wildlife surveys to compare method efficacy or cost efficiency, integrate distinct pieces of information provided by separate methods, or evaluate method-specific biases and misclassification error. Existing models that combine information from multiple field methods or sampling devices permit rigorous comparison of method-specific detection parameters, enable estimation of additional parameters such as false-positive detection probability, and improve occurrence or abundance estimates, but with the assumption that the separate sampling methods produce detections independently of one another. This assumption is tenuous if methods are paired or deployed in close proximity simultaneously, a common practice that reduces the additional effort required to implement multiple methods and reduces the risk that differences between method-specific detection parameters are confounded by other environmental factors. We develop occupancy and spatial capture-recapture models that permit covariance between the detections produced by different methods, use simulation to compare estimator performance of the new models to models assuming independence, and provide an empirical application based on American marten (Martes americana) surveys using paired remote cameras, hair catches, and snow tracking. Simulation results indicate existing models that assume that methods independently detect organisms produce biased parameter estimates and substantially understate estimate uncertainty when this assumption is violated, while our reformulated models are robust to either methodological independence or covariance. Empirical results suggested that remote cameras and snow tracking had comparable probability of detecting present martens, but that snow tracking also produced false-positive marten detections that could potentially substantially bias distribution estimates if not corrected for. Remote cameras detected marten individuals more readily than passive hair catches. Inability to photographically distinguish individual sex did not appear to induce negative bias in camera density estimates; instead, hair catches appeared to produce detection competition between individuals that may have been a source of negative bias. Our model reformulations broaden the range of circumstances in which analyses incorporating multiple sources of information can be robustly used, and our empirical results demonstrate that using multiple field-methods can enhance inferences regarding ecological parameters of interest and improve understanding of how reliably survey methods sample these parameters. © 2017 by the Ecological Society of America.

  17. Performance of maximum likelihood mixture models to estimate nursery habitat contributions to fish stocks: a case study on sea bream Sparus aurata

    PubMed Central

    Darnaude, Audrey M.

    2016-01-01

    Background Mixture models (MM) can be used to describe mixed stocks considering three sets of parameters: the total number of contributing sources, their chemical baseline signatures and their mixing proportions. When all nursery sources have been previously identified and sampled for juvenile fish to produce baseline nursery-signatures, mixing proportions are the only unknown set of parameters to be estimated from the mixed-stock data. Otherwise, the number of sources, as well as some/all nursery-signatures may need to be also estimated from the mixed-stock data. Our goal was to assess bias and uncertainty in these MM parameters when estimated using unconditional maximum likelihood approaches (ML-MM), under several incomplete sampling and nursery-signature separation scenarios. Methods We used a comprehensive dataset containing otolith elemental signatures of 301 juvenile Sparus aurata, sampled in three contrasting years (2008, 2010, 2011), from four distinct nursery habitats. (Mediterranean lagoons) Artificial nursery-source and mixed-stock datasets were produced considering: five different sampling scenarios where 0–4 lagoons were excluded from the nursery-source dataset and six nursery-signature separation scenarios that simulated data separated 0.5, 1.5, 2.5, 3.5, 4.5 and 5.5 standard deviations among nursery-signature centroids. Bias (BI) and uncertainty (SE) were computed to assess reliability for each of the three sets of MM parameters. Results Both bias and uncertainty in mixing proportion estimates were low (BI ≤ 0.14, SE ≤ 0.06) when all nursery-sources were sampled but exhibited large variability among cohorts and increased with the number of non-sampled sources up to BI = 0.24 and SE = 0.11. Bias and variability in baseline signature estimates also increased with the number of non-sampled sources, but tended to be less biased, and more uncertain than mixing proportion ones, across all sampling scenarios (BI < 0.13, SE < 0.29). Increasing separation among nursery signatures improved reliability of mixing proportion estimates, but lead to non-linear responses in baseline signature parameters. Low uncertainty, but a consistent underestimation bias affected the estimated number of nursery sources, across all incomplete sampling scenarios. Discussion ML-MM produced reliable estimates of mixing proportions and nursery-signatures under an important range of incomplete sampling and nursery-signature separation scenarios. This method failed, however, in estimating the true number of nursery sources, reflecting a pervasive issue affecting mixture models, within and beyond the ML framework. Large differences in bias and uncertainty found among cohorts were linked to differences in separation of chemical signatures among nursery habitats. Simulation approaches, such as those presented here, could be useful to evaluate sensitivity of MM results to separation and variability in nursery-signatures for other species, habitats or cohorts. PMID:27761305

  18. Maximum Likelihood Estimations and EM Algorithms with Length-biased Data

    PubMed Central

    Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu

    2012-01-01

    SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840

  19. Estimating disease prevalence from two-phase surveys with non-response at the second phase

    PubMed Central

    Gao, Sujuan; Hui, Siu L.; Hall, Kathleen S.; Hendrie, Hugh C.

    2010-01-01

    SUMMARY In this paper we compare several methods for estimating population disease prevalence from data collected by two-phase sampling when there is non-response at the second phase. The traditional weighting type estimator requires the missing completely at random assumption and may yield biased estimates if the assumption does not hold. We review two approaches and propose one new approach to adjust for non-response assuming that the non-response depends on a set of covariates collected at the first phase: an adjusted weighting type estimator using estimated response probability from a response model; a modelling type estimator using predicted disease probability from a disease model; and a regression type estimator combining the adjusted weighting type estimator and the modelling type estimator. These estimators are illustrated using data from an Alzheimer’s disease study in two populations. Simulation results are presented to investigate the performances of the proposed estimators under various situations. PMID:10931514

  20. An efficient parallel sampling technique for Multivariate Poisson-Lognormal model: Analysis with two crash count datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhan, Xianyuan; Aziz, H. M. Abdul; Ukkusuri, Satish V.

    Our study investigates the Multivariate Poisson-lognormal (MVPLN) model that jointly models crash frequency and severity accounting for correlations. The ordinary univariate count models analyze crashes of different severity level separately ignoring the correlations among severity levels. The MVPLN model is capable to incorporate the general correlation structure and takes account of the over dispersion in the data that leads to a superior data fitting. But, the traditional estimation approach for MVPLN model is computationally expensive, which often limits the use of MVPLN model in practice. In this work, a parallel sampling scheme is introduced to improve the original Markov Chainmore » Monte Carlo (MCMC) estimation approach of the MVPLN model, which significantly reduces the model estimation time. Two MVPLN models are developed using the pedestrian vehicle crash data collected in New York City from 2002 to 2006, and the highway-injury data from Washington State (5-year data from 1990 to 1994) The Deviance Information Criteria (DIC) is used to evaluate the model fitting. The estimation results show that the MVPLN models provide a superior fit over univariate Poisson-lognormal (PLN), univariate Poisson, and Negative Binomial models. Moreover, the correlations among the latent effects of different severity levels are found significant in both datasets that justifies the importance of jointly modeling crash frequency and severity accounting for correlations.« less

  1. An efficient parallel sampling technique for Multivariate Poisson-Lognormal model: Analysis with two crash count datasets

    DOE PAGES

    Zhan, Xianyuan; Aziz, H. M. Abdul; Ukkusuri, Satish V.

    2015-11-19

    Our study investigates the Multivariate Poisson-lognormal (MVPLN) model that jointly models crash frequency and severity accounting for correlations. The ordinary univariate count models analyze crashes of different severity level separately ignoring the correlations among severity levels. The MVPLN model is capable to incorporate the general correlation structure and takes account of the over dispersion in the data that leads to a superior data fitting. But, the traditional estimation approach for MVPLN model is computationally expensive, which often limits the use of MVPLN model in practice. In this work, a parallel sampling scheme is introduced to improve the original Markov Chainmore » Monte Carlo (MCMC) estimation approach of the MVPLN model, which significantly reduces the model estimation time. Two MVPLN models are developed using the pedestrian vehicle crash data collected in New York City from 2002 to 2006, and the highway-injury data from Washington State (5-year data from 1990 to 1994) The Deviance Information Criteria (DIC) is used to evaluate the model fitting. The estimation results show that the MVPLN models provide a superior fit over univariate Poisson-lognormal (PLN), univariate Poisson, and Negative Binomial models. Moreover, the correlations among the latent effects of different severity levels are found significant in both datasets that justifies the importance of jointly modeling crash frequency and severity accounting for correlations.« less

  2. Estimating a Logistic Discrimination Functions When One of the Training Samples Is Subject to Misclassification: A Maximum Likelihood Approach.

    PubMed

    Nagelkerke, Nico; Fidler, Vaclav

    2015-01-01

    The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.

  3. CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS.

    PubMed

    Shalizi, Cosma Rohilla; Rinaldo, Alessandro

    2013-04-01

    The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling , or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

  4. CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS

    PubMed Central

    Shalizi, Cosma Rohilla; Rinaldo, Alessandro

    2015-01-01

    The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM’s expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. PMID:26166910

  5. Experimental design and efficient parameter estimation in preclinical pharmacokinetic studies.

    PubMed

    Ette, E I; Howie, C A; Kelman, A W; Whiting, B

    1995-05-01

    Monte Carlo simulation technique used to evaluate the effect of the arrangement of concentrations on the efficiency of estimation of population pharmacokinetic parameters in the preclinical setting is described. Although the simulations were restricted to the one compartment model with intravenous bolus input, they provide the basis of discussing some structural aspects involved in designing a destructive ("quantic") preclinical population pharmacokinetic study with a fixed sample size as is usually the case in such studies. The efficiency of parameter estimation obtained with sampling strategies based on the three and four time point designs were evaluated in terms of the percent prediction error, design number, individual and joint confidence intervals coverage for parameter estimates approaches, and correlation analysis. The data sets contained random terms for both inter- and residual intra-animal variability. The results showed that the typical population parameter estimates for clearance and volume were efficiently (accurately and precisely) estimated for both designs, while interanimal variability (the only random effect parameter that could be estimated) was inefficiently (inaccurately and imprecisely) estimated with most sampling schedules of the two designs. The exact location of the third and fourth time point for the three and four time point designs, respectively, was not critical to the efficiency of overall estimation of all population parameters of the model. However, some individual population pharmacokinetic parameters were sensitive to the location of these times.

  6. Modification of the Sandwich Estimator in Generalized Estimating Equations with Correlated Binary Outcomes in Rare Event and Small Sample Settings

    PubMed Central

    Rogers, Paul; Stoner, Julie

    2016-01-01

    Regression models for correlated binary outcomes are commonly fit using a Generalized Estimating Equations (GEE) methodology. GEE uses the Liang and Zeger sandwich estimator to produce unbiased standard error estimators for regression coefficients in large sample settings even when the covariance structure is misspecified. The sandwich estimator performs optimally in balanced designs when the number of participants is large, and there are few repeated measurements. The sandwich estimator is not without drawbacks; its asymptotic properties do not hold in small sample settings. In these situations, the sandwich estimator is biased downwards, underestimating the variances. In this project, a modified form for the sandwich estimator is proposed to correct this deficiency. The performance of this new sandwich estimator is compared to the traditional Liang and Zeger estimator as well as alternative forms proposed by Morel, Pan and Mancl and DeRouen. The performance of each estimator was assessed with 95% coverage probabilities for the regression coefficient estimators using simulated data under various combinations of sample sizes and outcome prevalence values with an Independence (IND), Autoregressive (AR) and Compound Symmetry (CS) correlation structure. This research is motivated by investigations involving rare-event outcomes in aviation data. PMID:26998504

  7. Accounting for animal movement in estimation of resource selection functions: sampling and data analysis.

    PubMed

    Forester, James D; Im, Hae Kyung; Rathouz, Paul J

    2009-12-01

    Patterns of resource selection by animal populations emerge as a result of the behavior of many individuals. Statistical models that describe these population-level patterns of habitat use can miss important interactions between individual animals and characteristics of their local environment; however, identifying these interactions is difficult. One approach to this problem is to incorporate models of individual movement into resource selection models. To do this, we propose a model for step selection functions (SSF) that is composed of a resource-independent movement kernel and a resource selection function (RSF). We show that standard case-control logistic regression may be used to fit the SSF; however, the sampling scheme used to generate control points (i.e., the definition of availability) must be accommodated. We used three sampling schemes to analyze simulated movement data and found that ignoring sampling and the resource-independent movement kernel yielded biased estimates of selection. The level of bias depended on the method used to generate control locations, the strength of selection, and the spatial scale of the resource map. Using empirical or parametric methods to sample control locations produced biased estimates under stronger selection; however, we show that the addition of a distance function to the analysis substantially reduced that bias. Assuming a uniform availability within a fixed buffer yielded strongly biased selection estimates that could be corrected by including the distance function but remained inefficient relative to the empirical and parametric sampling methods. As a case study, we used location data collected from elk in Yellowstone National Park, USA, to show that selection and bias may be temporally variable. Because under constant selection the amount of bias depends on the scale at which a resource is distributed in the landscape, we suggest that distance always be included as a covariate in SSF analyses. This approach to modeling resource selection is easily implemented using common statistical tools and promises to provide deeper insight into the movement ecology of animals.

  8. SEMIPARAMETRIC ADDITIVE RISKS REGRESSION FOR TWO-STAGE DESIGN SURVIVAL STUDIES

    PubMed Central

    Li, Gang; Wu, Tong Tong

    2011-01-01

    In this article we study a semiparametric additive risks model (McKeague and Sasieni (1994)) for two-stage design survival data where accurate information is available only on second stage subjects, a subset of the first stage study. We derive two-stage estimators by combining data from both stages. Large sample inferences are developed. As a by-product, we also obtain asymptotic properties of the single stage estimators of McKeague and Sasieni (1994) when the semiparametric additive risks model is misspecified. The proposed two-stage estimators are shown to be asymptotically more efficient than the second stage estimators. They also demonstrate smaller bias and variance for finite samples. The developed methods are illustrated using small intestine cancer data from the SEER (Surveillance, Epidemiology, and End Results) Program. PMID:21931467

  9. Data processing 1: Advancements in machine analysis of multispectral data

    NASA Technical Reports Server (NTRS)

    Swain, P. H.

    1972-01-01

    Multispectral data processing procedures are outlined beginning with the data display process used to accomplish data editing and proceeding through clustering, feature selection criterion for error probability estimation, and sample clustering and sample classification. The effective utilization of large quantities of remote sensing data by formulating a three stage sampling model for evaluation of crop acreage estimates represents an improvement in determining the cost benefit relationship associated with remote sensing technology.

  10. Developing a methodology for the inverse estimation of root architectural parameters from field based sampling schemes

    NASA Astrophysics Data System (ADS)

    Morandage, Shehan; Schnepf, Andrea; Vanderborght, Jan; Javaux, Mathieu; Leitner, Daniel; Laloy, Eric; Vereecken, Harry

    2017-04-01

    Root traits are increasingly important in breading of new crop varieties. E.g., longer and fewer lateral roots are suggested to improve drought resistance of wheat. Thus, detailed root architectural parameters are important. However, classical field sampling of roots only provides more aggregated information such as root length density (coring), root counts per area (trenches) or root arrival curves at certain depths (rhizotubes). We investigate the possibility of obtaining the information about root system architecture of plants using field based classical root sampling schemes, based on sensitivity analysis and inverse parameter estimation. This methodology was developed based on a virtual experiment where a root architectural model was used to simulate root system development in a field, parameterized for winter wheat. This information provided the ground truth which is normally unknown in a real field experiment. The three sampling schemes coring, trenching, and rhizotubes where virtually applied to and aggregated information computed. Morris OAT global sensitivity analysis method was then performed to determine the most sensitive parameters of root architecture model for the three different sampling methods. The estimated means and the standard deviation of elementary effects of a total number of 37 parameters were evaluated. Upper and lower bounds of the parameters were obtained based on literature and published data of winter wheat root architectural parameters. Root length density profiles of coring, arrival curve characteristics observed in rhizotubes, and root counts in grids of trench profile method were evaluated statistically to investigate the influence of each parameter using five different error functions. Number of branches, insertion angle inter-nodal distance, and elongation rates are the most sensitive parameters and the parameter sensitivity varies slightly with the depth. Most parameters and their interaction with the other parameters show highly nonlinear effect to the model output. The most sensitive parameters will be subject to inverse estimation from the virtual field sampling data using DREAMzs algorithm. The estimated parameters can then be compared with the ground truth in order to determine the suitability of the sampling schemes to identify specific traits or parameters of the root growth model.

  11. Accuracy of sample dimension-dependent pedotransfer functions in estimation of soil saturated hydraulic conductivity

    USDA-ARS?s Scientific Manuscript database

    Saturated hydraulic conductivity Ksat is a fundamental characteristic in modeling flow and contaminant transport in soils and sediments. Therefore, many models have been developed to estimate Ksat from easily measureable parameters, such as textural properties, bulk density, etc. However, Ksat is no...

  12. ESTIMATING CHILDREN'S DERMAL AND NON-DIETARY INGESTION EXPOSURE AND DOSE WITH EPA'S SHEDS MODEL

    EPA Science Inventory

    A physically-based stochastic model (SHEDS) has been developed to estimate pesticide exposure and dose to children via dermal residue contact and non-dietary ingestion. Time-location-activity data are sampled from national survey results to generate a population of simulated ch...

  13. Invariance Properties for General Diagnostic Classification Models

    ERIC Educational Resources Information Center

    Bradshaw, Laine P.; Madison, Matthew J.

    2016-01-01

    In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic…

  14. Mixed Estimation for a Forest Survey Sample Design

    Treesearch

    Francis A. Roesch

    1999-01-01

    Three methods of estimating the current state of forest attributes over small areas for the USDA Forest Service Southern Research Station's annual forest sampling design are compared. The three methods were (I) simple moving average, (II) single imputation of plot data that had been updated by externally developed models, and (III) local application of a global...

  15. Influence function based variance estimation and missing data issues in case-cohort studies.

    PubMed

    Mark, S D; Katki, H

    2001-12-01

    Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.

  16. Pore-throat radius and tortuosity estimation from formation resistivity data for tight-gas sandstone reservoirs

    NASA Astrophysics Data System (ADS)

    Ziarani, Ali S.; Aguilera, Roberto

    2012-08-01

    A new model is proposed for estimation of pore-throat aperture size from formation resistivity factor and permeability data. The model is validated with data from the Mesaverde sandstone using brine salinities ranging from 20,000 to 200,000 ppm. The data analyzed includes various basins such as Green River, Piceance, Sand Wash, Powder River, Uinta, Washakie and Wind River, available in the literature. For pore-throat radii analysis the methodology involves the use of log-log plots of pore-throat radius versus the product of formation resistivity factor and permeability (rT = a(FK)b + c). The model fits over 280 samples from the Mesaverde formation with coefficients of determination varying between 0.95 and 0.99 depending primarily on the type of model used for pore throat radius calculation. The brine salinity has some minor effects on the results. The model can provide better estimates of pore-throat radii if it is calibrated with experimental techniques such as mercury porosimetry. The results show pore-throat radii varying between 0.001 and 5 μm for the Mesaverde tight sandstone; however, most of the samples fall in a range between 0.01 and 1 μm. For tortuosity analysis, the calculation involves the use of product of formation factor and porosity data. Results indicate that the estimated tortuosity values range mainly between 1 and 5. For samples with lower porosities (< 5%), tortuosity values show a wider scatter (between 1 and 8); whereas for samples with larger porosities (> 15%), the scattering in tortuosity decreases significantly. In general, for tortuosity calculation in tight gas sandstone formations, a square root model with a parameter (bf) representing various types of connecting pores, i.e., sheet-like and tubular pores, is recommended.

  17. The role of global cloud climatologies in validating numerical models

    NASA Technical Reports Server (NTRS)

    HARSHVARDHAN

    1993-01-01

    The purpose of this work is to estimate sampling errors of area-time averaged rain rate due to temporal samplings by satellites. In particular, the sampling errors of the proposed low inclination orbit satellite of the Tropical Rainfall Measuring Mission (TRMM) (35 deg inclination and 350 km altitude), one of the sun synchronous polar orbiting satellites of NOAA series (98.89 deg inclination and 833 km altitude), and two simultaneous sun synchronous polar orbiting satellites--assumed to carry a perfect passive microwave sensor for direct rainfall measurements--will be estimated. This estimate is done by performing a study of the satellite orbits and the autocovariance function of the area-averaged rain rate time series. A model based on an exponential fit of the autocovariance function is used for actual calculations. Varying visiting intervals and total coverage of averaging area on each visit by the satellites are taken into account in the model. The data are generated by a General Circulation Model (GCM). The model has a diurnal cycle and parameterized convective processes. A special run of the GCM was made at NASA/GSFC in which the rainfall and precipitable water fields were retained globally for every hour of the run for the whole year.

  18. Water quality of storm runoff and comparison of procedures for estimating storm-runoff loads, volume, event-mean concentrations, and the mean load for a storm for selected properties and constituents for Colorado Springs, southeastern Colorado, 1992

    USGS Publications Warehouse

    Von Guerard, Paul; Weiss, W.B.

    1995-01-01

    The U.S. Environmental Protection Agency requires that municipalities that have a population of 100,000 or greater obtain National Pollutant Discharge Elimination System permits to characterize the quality of their storm runoff. In 1992, the U.S. Geological Survey, in cooperation with the Colorado Springs City Engineering Division, began a study to characterize the water quality of storm runoff and to evaluate procedures for the estimation of storm-runoff loads, volume and event-mean concentrations for selected properties and constituents. Precipitation, streamflow, and water-quality data were collected during 1992 at five sites in Colorado Springs. Thirty-five samples were collected, seven at each of the five sites. At each site, three samples were collected for permitting purposes; two of the samples were collected during rainfall runoff, and one sample was collected during snowmelt runoff. Four additional samples were collected at each site to obtain a large enough sample size to estimate storm-runoff loads, volume, and event-mean concentrations for selected properties and constituents using linear-regression procedures developed using data from the Nationwide Urban Runoff Program (NURP). Storm-water samples were analyzed for as many as 186 properties and constituents. The constituents measured include total-recoverable metals, vola-tile-organic compounds, acid-base/neutral organic compounds, and pesticides. Storm runoff sampled had large concentrations of chemical oxygen demand and 5-day biochemical oxygen demand. Chemical oxygen demand ranged from 100 to 830 milligrams per liter, and 5.-day biochemical oxygen demand ranged from 14 to 260 milligrams per liter. Total-organic carbon concentrations ranged from 18 to 240 milligrams per liter. The total-recoverable metals lead and zinc had the largest concentrations of the total-recoverable metals analyzed. Concentrations of lead ranged from 23 to 350 micrograms per liter, and concentrations of zinc ranged from 110 to 1,400 micrograms per liter. The data for 30 storms representing rainfall runoff from 5 drainage basins were used to develop single-storm local-regression models. The response variables, storm-runoff loads, volume, and event-mean concentrations were modeled using explanatory variables for climatic, physical, and land-use characteristics. The r2 for models that use ordinary least-squares regression ranged from 0.57 to 0.86 for storm-runoff loads and volume and from 0.25 to 0.63 for storm-runoff event-mean concentrations. Except for cadmium, standard errors of estimate ranged from 43 to 115 percent for storm- runoff loads and volume and from 35 to 66 percent for storm-runoff event-mean concentrations. Eleven of the 30 concentrations collected during rainfall runoff for total-recoverable cadmium were censored (less than) concentrations. Ordinary least-squares regression should not be used with censored data; however, censored data can be included with uncensored data using tobit regression. Standard errors of estimate for storm-runoff load and event-mean concentration for total-recoverable cadmium, computed using tobit regression, are 247 and 171 percent. Estimates from single-storm regional-regression models, developed from the Nationwide Urban Runoff Program data base, were compared with observed storm-runoff loads, volume, and event-mean concentrations determined from samples collected in the study area. Single-storm regional-regression models tended to overestimate storm-runoff loads, volume, and event-mean con-centrations. Therefore, single-storm local- and regional-regression models were combined using model-adjustment procedures to take advantage of the strengths of both models while minimizing the deficiencies of each model. Procedures were used to develop single-stormregression equations that were adjusted using local data and estimates from single-storm regional-regression equations. Single-storm regression models developed using model- adjustment proce

  19. Simulation program for estimating statistical power of Cox's proportional hazards model assuming no specific distribution for the survival time.

    PubMed

    Akazawa, K; Nakamura, T; Moriguchi, S; Shimada, M; Nose, Y

    1991-07-01

    Small sample properties of the maximum partial likelihood estimates for Cox's proportional hazards model depend on the sample size, the true values of regression coefficients, covariate structure, censoring pattern and possibly baseline hazard functions. Therefore, it would be difficult to construct a formula or table to calculate the exact power of a statistical test for the treatment effect in any specific clinical trial. The simulation program, written in SAS/IML, described in this paper uses Monte-Carlo methods to provide estimates of the exact power for Cox's proportional hazards model. For illustrative purposes, the program was applied to real data obtained from a clinical trial performed in Japan. Since the program does not assume any specific function for the baseline hazard, it is, in principle, applicable to any censored survival data as long as they follow Cox's proportional hazards model.

  20. Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume

    2013-01-01

    Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.

  1. Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

    PubMed

    Hu, Jianhua; Wright, Fred A

    2007-03-01

    The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.

  2. Diagnostic test accuracy and prevalence inferences based on joint and sequential testing with finite population sampling.

    PubMed

    Su, Chun-Lung; Gardner, Ian A; Johnson, Wesley O

    2004-07-30

    The two-test two-population model, originally formulated by Hui and Walter, for estimation of test accuracy and prevalence estimation assumes conditionally independent tests, constant accuracy across populations and binomial sampling. The binomial assumption is incorrect if all individuals in a population e.g. child-care centre, village in Africa, or a cattle herd are sampled or if the sample size is large relative to population size. In this paper, we develop statistical methods for evaluating diagnostic test accuracy and prevalence estimation based on finite sample data in the absence of a gold standard. Moreover, two tests are often applied simultaneously for the purpose of obtaining a 'joint' testing strategy that has either higher overall sensitivity or specificity than either of the two tests considered singly. Sequential versions of such strategies are often applied in order to reduce the cost of testing. We thus discuss joint (simultaneous and sequential) testing strategies and inference for them. Using the developed methods, we analyse two real and one simulated data sets, and we compare 'hypergeometric' and 'binomial-based' inferences. Our findings indicate that the posterior standard deviations for prevalence (but not sensitivity and specificity) based on finite population sampling tend to be smaller than their counterparts for infinite population sampling. Finally, we make recommendations about how small the sample size should be relative to the population size to warrant use of the binomial model for prevalence estimation. Copyright 2004 John Wiley & Sons, Ltd.

  3. Robust Portfolio Optimization Using Pseudodistances.

    PubMed

    Toma, Aida; Leoni-Aubin, Samuela

    2015-01-01

    The presence of outliers in financial asset returns is a frequently occurring phenomenon which may lead to unreliable mean-variance optimized portfolios. This fact is due to the unbounded influence that outliers can have on the mean returns and covariance estimators that are inputs in the optimization procedure. In this paper we present robust estimators of mean and covariance matrix obtained by minimizing an empirical version of a pseudodistance between the assumed model and the true model underlying the data. We prove and discuss theoretical properties of these estimators, such as affine equivariance, B-robustness, asymptotic normality and asymptotic relative efficiency. These estimators can be easily used in place of the classical estimators, thereby providing robust optimized portfolios. A Monte Carlo simulation study and applications to real data show the advantages of the proposed approach. We study both in-sample and out-of-sample performance of the proposed robust portfolios comparing them with some other portfolios known in literature.

  4. Robust Portfolio Optimization Using Pseudodistances

    PubMed Central

    2015-01-01

    The presence of outliers in financial asset returns is a frequently occurring phenomenon which may lead to unreliable mean-variance optimized portfolios. This fact is due to the unbounded influence that outliers can have on the mean returns and covariance estimators that are inputs in the optimization procedure. In this paper we present robust estimators of mean and covariance matrix obtained by minimizing an empirical version of a pseudodistance between the assumed model and the true model underlying the data. We prove and discuss theoretical properties of these estimators, such as affine equivariance, B-robustness, asymptotic normality and asymptotic relative efficiency. These estimators can be easily used in place of the classical estimators, thereby providing robust optimized portfolios. A Monte Carlo simulation study and applications to real data show the advantages of the proposed approach. We study both in-sample and out-of-sample performance of the proposed robust portfolios comparing them with some other portfolios known in literature. PMID:26468948

  5. Multinomial mixture model with heterogeneous classification probabilities

    USGS Publications Warehouse

    Holland, M.D.; Gray, B.R.

    2011-01-01

    Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.

  6. Estimation of respirable dust exposure among coal miners in South Africa.

    PubMed

    Naidoo, Rajen; Seixas, Noah; Robins, Thomas

    2006-06-01

    The use of retrospective occupational hygiene data for epidemiologic studies is useful in determining exposure-outcome relationships, but the potential for exposure misclassification is high. Although dust sampling in the South African coal industry has been a legal requirement for several decades, these historical data are not readily adequate for estimating past exposures. This study describes the respirable coal mine dust levels in three South African coal mines over time. Each of the participating mining operations had well-documented dust sampling information that was used to describe historical trends in dust exposure. Investigator-collected personal dust samples were taken using standardized techniques from the face, backbye (underground jobs not at the coal face), and surface from 50 miners at each mine, repeated over three sampling cycles. Job histories and exposure information was obtained from a sample of 684 current miners and 188 ex-miners. Linear models were developed to estimate the exposure levels associated with work in each mine, exposure zone, and over time using a combination of operator-collected historical data and investigator-collected samples. The estimated levels were then combined with work history information to calculate cumulative exposure metrics for the miner cohort. The mean historical and investigator-collected respirable dust levels were within international norms and South African standards. Silica content of the dust samples was also below the 5% regulatory action level. Mean respirable dust concentrations at the face, based on investigator-collected samples, were 0.9 mg/m(3), 1.3 mg/m(3), and 1.9 mg/m(3) at Mines 1, 2, and 3, respectively. The operator-collected samples showed considerable variability across exposure zones, mines, and time, with the annual means at the face ranging from 0.4 mg/m(3) to 2.9 mg/m(3). Statistically significant findings were found between operator- and investigator-collected dust samples. Model-based arithmetic mean dust estimates at the face were 1.2 mg/m(3), 2.0 mg/m(3), and 0.9 mg/m(3) for Mines 1, 2, and 3, respectively. Using these levels, the mean cumulative exposure for the cohort was 56.8 mg-years/m(3). Current miners had a mean cumulative exposure of 66.5 mg-years/m(3), compared with ex-miners of 26.8 mg-years/m(3). Improvements in dust management or the use of different sampling equipment could account for the significant differences seen between operator- and investigator-collected data. Regression modeling for estimating mean dust levels over time using combined historical and investigator-collected data seems a reasonable method and useful in constructing models to describe cumulative exposures in a cohort of current and ex-miners.

  7. A spatial mark–resight model augmented with telemetry data

    USGS Publications Warehouse

    Sollmann, Rachel; Gardner, Beth; Parsons, Arielle W.; Stocking, Jessica J.; McClintock, Brett T.; Simons, Theodore R.; Pollock, Kenneth H.; O’Connell, Allan F.

    2013-01-01

    Abundance and population density are fundamental pieces of information for population ecology and species conservation, but they are difficult to estimate for rare and elusive species. Mark-resight models are popular for estimating population abundance because they are less invasive and expensive than traditional mark-recapture. However, density estimation using mark-resight is difficult because the area sampled must be explicitly defined, historically using ad-hoc approaches. We develop a spatial mark-resight model for estimating population density that combines spatial resighting data and telemetry data. Incorporating telemetry data allows us to inform model parameters related to movement and individual location. Our model also allows 2. The model presented here will have widespread utility in future applications, especially for species that are not naturally marked.

  8. Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

    PubMed Central

    Han, Fang; Liu, Han

    2016-01-01

    Correlation matrix plays a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson’s sample correlation matrix. Although Pearson’s sample correlation matrix enjoys various good properties under Gaussian models, its not an effective estimator when facing heavy-tail distributions with possible outliers. As a robust alternative, Han and Liu (2013b) advocated the use of a transformed version of the Kendall’s tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall’s tau sample correlation matrix and its transformed version proposed in Han and Liu (2013b) for estimating the population Kendall’s tau correlation matrix and the latent Pearson’s correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of “effective rank” in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a “sign subgaussian condition” which is sufficient to guarantee that the rank-based correlation matrix estimator attains the optimal rate of convergence. In both cases, we do not need any moment condition. PMID:28337068

  9. Individualized statistical learning from medical image databases: application to identification of brain lesions.

    PubMed

    Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos

    2014-04-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. A comparison of moment-based methods of estimation for the log Pearson type 3 distribution

    NASA Astrophysics Data System (ADS)

    Koutrouvelis, I. A.; Canavos, G. C.

    2000-06-01

    The log Pearson type 3 distribution is a very important model in statistical hydrology, especially for modeling annual flood series. In this paper we compare the various methods based on moments for estimating quantiles of this distribution. Besides the methods of direct and mixed moments which were found most successful in previous studies and the well-known indirect method of moments, we develop generalized direct moments and generalized mixed moments methods and a new method of adaptive mixed moments. The last method chooses the orders of two moments for the original observations by utilizing information contained in the sample itself. The results of Monte Carlo experiments demonstrated the superiority of this method in estimating flood events of high return periods when a large sample is available and in estimating flood events of low return periods regardless of the sample size. In addition, a comparison of simulation and asymptotic results shows that the adaptive method may be used for the construction of meaningful confidence intervals for design events based on the asymptotic theory even with small samples. The simulation results also point to the specific members of the class of generalized moments estimates which maintain small values for bias and/or mean square error.

  11. Mechanical properties of porcine brain tissue in vivo and ex vivo estimated by MR elastography.

    PubMed

    Guertler, Charlotte A; Okamoto, Ruth J; Schmidt, John L; Badachhape, Andrew A; Johnson, Curtis L; Bayly, Philip V

    2018-03-01

    The mechanical properties of brain tissue in vivo determine the response of the brain to rapid skull acceleration. These properties are thus of great interest to the developers of mathematical models of traumatic brain injury (TBI) or neurosurgical simulations. Animal models provide valuable insight that can improve TBI modeling. In this study we compare estimates of mechanical properties of the Yucatan mini-pig brain in vivo and ex vivo using magnetic resonance elastography (MRE) at multiple frequencies. MRE allows estimations of properties in soft tissue, either in vivo or ex vivo, by imaging harmonic shear wave propagation. Most direct measurements of brain mechanical properties have been performed using samples of brain tissue ex vivo. It has been observed that direct estimates of brain mechanical properties depend on the frequency and amplitude of loading, as well as the time post-mortem and condition of the sample. Using MRE in the same animals at overlapping frequencies, we observe that porcine brain tissue in vivo appears stiffer than porcine brain tissue samples ex vivo at frequencies of 100 Hz and 125 Hz, but measurements show closer agreement at lower frequencies. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic analyses: Insights into spatial variability using high-resolution satellite data

    PubMed Central

    Alexeeff, Stacey E.; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A.

    2016-01-01

    Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1km x 1km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R2 yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with greater than 0.9 out-of-sample R2 yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the standard errors. Land use regression models performed better in chronic effects simulations. These results can help researchers when interpreting health effect estimates in these types of studies. PMID:24896768

  13. Validation of two dilution models to predict chloramine-T concentrations in aquaculture facility effluent

    USGS Publications Warehouse

    Gaikowski, M.P.; Larson, W.J.; Steuer, J.J.; Gingerich, W.H.

    2004-01-01

    Accurate estimates of drug concentrations in hatchery effluent are critical to assess the environmental risk of hatchery drug discharge resulting from disease treatment. This study validated two dilution simple n models to estimate chloramine-T environmental introduction concentrations by comparing measured and predicted chloramine-T concentrations using the US Geological Survey's Upper Midwest Environmental Sciences Center aquaculture facility effluent as an example. The hydraulic characteristics of our treated raceway and effluent and the accuracy of our water flow rate measurements were confirmed with the marker dye rhodamine WT. We also used the rhodamine WT data to develop dilution models that would (1) estimate the chloramine-T concentration at a given time and location in the effluent system and (2) estimate the average chloramine-T concentration at a given location over the entire discharge period. To test our models, we predicted the chloramine-T concentration at two sample points based on effluent flow and the maintenance of chloramine-T at 20 mg/l for 60 min in the same raceway used with rhodamine WT. The effluent sample points selected (sample points A and B) represented 47 and 100% of the total effluent flow, respectively. Sample point B is-analogous to the discharge of a hatchery that does not have a detention lagoon, i.e. The sample site was downstream of the last dilution water addition following treatment. We then applied four chloramine-T flow-through treatments at 20mg/l for 60 min and measured the chloramine-T concentration in water samples collected every 15 min for about 180 min from the treated raceway and sample points A and B during and after application. The predicted chloramine-T concentration at each sampling interval was similar to the measured chloramine-T concentration at sample points A and B and was generally bounded by the measured 90% confidence intervals. The predicted aver,age chloramine-T concentrations at sample points A or B (2.8 and 1.3 mg/l, respectively) were not significantly different (P > 0.05) from the average measured chloramine-T concentrations (2.7 and 1.3 mg/l, respectively). The close agreement between our predicted and measured chloramine-T concentrations indicate either of the dilution models could be used to adequately predict the chloramine-T environmental introduction concentration in Upper Midwest Environmental Sciences Center effluent. (C) 2003 Elsevier B.V. All rights reserved.

  14. An Employee Total Health Management–Based Survey of Iowa Employers

    PubMed Central

    Merchant, James A.; Lind, David P.; Kelly, Kevin M.; Hall, Jennifer L.

    2015-01-01

    Objective To implement an Employee Total Health Management (ETHM) model-based questionnaire and provide estimates of model program elements among a statewide sample of Iowa employers. Methods Survey a stratified random sample of Iowa employers, characterize and estimate employer participation in ETHM program elements Results Iowa employers are implementing under 30% of all 12 components of ETHM, with the exception of occupational safety and health (46.6%) and worker compensation insurance coverage (89.2%), but intend modest expansion of all components in the coming year. Conclusions The Employee Total Health Management questionnaire-based survey provides estimates of progress Iowa employers are making toward implementing components of total worker health programs. PMID:24284757

  15. Hybrid Optimal Design of the Eco-Hydrological Wireless Sensor Network in the Middle Reach of the Heihe River Basin, China

    PubMed Central

    Kang, Jian; Li, Xin; Jin, Rui; Ge, Yong; Wang, Jinfeng; Wang, Jianghao

    2014-01-01

    The eco-hydrological wireless sensor network (EHWSN) in the middle reaches of the Heihe River Basin in China is designed to capture the spatial and temporal variability and to estimate the ground truth for validating the remote sensing productions. However, there is no available prior information about a target variable. To meet both requirements, a hybrid model-based sampling method without any spatial autocorrelation assumptions is developed to optimize the distribution of EHWSN nodes based on geostatistics. This hybrid model incorporates two sub-criteria: one for the variogram modeling to represent the variability, another for improving the spatial prediction to evaluate remote sensing productions. The reasonability of the optimized EHWSN is validated from representativeness, the variogram modeling and the spatial accuracy through using 15 types of simulation fields generated with the unconditional geostatistical stochastic simulation. The sampling design shows good representativeness; variograms estimated by samples have less than 3% mean error relative to true variograms. Then, fields at multiple scales are predicted. As the scale increases, estimated fields have higher similarities to simulation fields at block sizes exceeding 240 m. The validations prove that this hybrid sampling method is effective for both objectives when we do not know the characteristics of an optimized variables. PMID:25317762

  16. Hybrid optimal design of the eco-hydrological wireless sensor network in the middle reach of the Heihe River Basin, China.

    PubMed

    Kang, Jian; Li, Xin; Jin, Rui; Ge, Yong; Wang, Jinfeng; Wang, Jianghao

    2014-10-14

    The eco-hydrological wireless sensor network (EHWSN) in the middle reaches of the Heihe River Basin in China is designed to capture the spatial and temporal variability and to estimate the ground truth for validating the remote sensing productions. However, there is no available prior information about a target variable. To meet both requirements, a hybrid model-based sampling method without any spatial autocorrelation assumptions is developed to optimize the distribution of EHWSN nodes based on geostatistics. This hybrid model incorporates two sub-criteria: one for the variogram modeling to represent the variability, another for improving the spatial prediction to evaluate remote sensing productions. The reasonability of the optimized EHWSN is validated from representativeness, the variogram modeling and the spatial accuracy through using 15 types of simulation fields generated with the unconditional geostatistical stochastic simulation. The sampling design shows good representativeness; variograms estimated by samples have less than 3% mean error relative to true variograms. Then, fields at multiple scales are predicted. As the scale increases, estimated fields have higher similarities to simulation fields at block sizes exceeding 240 m. The validations prove that this hybrid sampling method is effective for both objectives when we do not know the characteristics of an optimized variables.

  17. Monitoring landscape metrics by point sampling: accuracy in estimating Shannon's diversity and edge density.

    PubMed

    Ramezani, Habib; Holm, Sören; Allard, Anna; Ståhl, Göran

    2010-05-01

    Environmental monitoring of landscapes is of increasing interest. To quantify landscape patterns, a number of metrics are used, of which Shannon's diversity, edge length, and density are studied here. As an alternative to complete mapping, point sampling was applied to estimate the metrics for already mapped landscapes selected from the National Inventory of Landscapes in Sweden (NILS). Monte-Carlo simulation was applied to study the performance of different designs. Random and systematic samplings were applied for four sample sizes and five buffer widths. The latter feature was relevant for edge length, since length was estimated through the number of points falling in buffer areas around edges. In addition, two landscape complexities were tested by applying two classification schemes with seven or 20 land cover classes to the NILS data. As expected, the root mean square error (RMSE) of the estimators decreased with increasing sample size. The estimators of both metrics were slightly biased, but the bias of Shannon's diversity estimator was shown to decrease when sample size increased. In the edge length case, an increasing buffer width resulted in larger bias due to the increased impact of boundary conditions; this effect was shown to be independent of sample size. However, we also developed adjusted estimators that eliminate the bias of the edge length estimator. The rates of decrease of RMSE with increasing sample size and buffer width were quantified by a regression model. Finally, indicative cost-accuracy relationships were derived showing that point sampling could be a competitive alternative to complete wall-to-wall mapping.

  18. Enhancement of low-temperature thermometry by strong coupling

    NASA Astrophysics Data System (ADS)

    Correa, Luis A.; Perarnau-Llobet, Martí; Hovhannisyan, Karen V.; Hernández-Santana, Senaida; Mehboudi, Mohammad; Sanpera, Anna

    2017-12-01

    We consider the problem of estimating the temperature T of a very cold equilibrium sample. The temperature estimates are drawn from measurements performed on a quantum Brownian probe strongly coupled to it. We model this scenario by resorting to the canonical Caldeira-Leggett Hamiltonian and find analytically the exact stationary state of the probe for arbitrary coupling strength. In general, the probe does not reach thermal equilibrium with the sample, due to their nonperturbative interaction. We argue that this is advantageous for low-temperature thermometry, as we show in our model that (i) the thermometric precision at low T can be significantly enhanced by strengthening the probe-sampling coupling, (ii) the variance of a suitable quadrature of our Brownian thermometer can yield temperature estimates with nearly minimal statistical uncertainty, and (iii) the spectral density of the probe-sample coupling may be engineered to further improve thermometric performance. These observations may find applications in practical nanoscale thermometry at low temperatures—a regime which is particularly relevant to quantum technologies.

  19. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  20. Iterative Importance Sampling Algorithms for Parameter Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

    In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less

  1. Influence of scanning parameters on the estimation accuracy of control points of B-spline surfaces

    NASA Astrophysics Data System (ADS)

    Aichinger, Julia; Schwieger, Volker

    2018-04-01

    This contribution deals with the influence of scanning parameters like scanning distance, incidence angle, surface quality and sampling width on the average estimated standard deviations of the position of control points from B-spline surfaces which are used to model surfaces from terrestrial laser scanning data. The influence of the scanning parameters is analyzed by the Monte Carlo based variance analysis. The samples were generated for non-correlated and correlated data, leading to the samples generated by Latin hypercube and replicated Latin hypercube sampling algorithms. Finally, the investigations show that the most influential scanning parameter is the distance from the laser scanner to the object. The angle of incidence shows a significant effect for distances of 50 m and longer, while the surface quality contributes only negligible effects. The sampling width has no influence. Optimal scanning parameters can be found in the smallest possible object distance at an angle of incidence close to 0° in the highest surface quality. The consideration of correlations improves the estimation accuracy and underlines the importance of complete stochastic models for TLS measurements.

  2. Sampling scales define occupancy and underlying occupancy-abundance relationships in animals.

    PubMed

    Steenweg, Robin; Hebblewhite, Mark; Whittington, Jesse; Lukacs, Paul; McKelvey, Kevin

    2018-01-01

    Occupancy-abundance (OA) relationships are a foundational ecological phenomenon and field of study, and occupancy models are increasingly used to track population trends and understand ecological interactions. However, these two fields of ecological inquiry remain largely isolated, despite growing appreciation of the importance of integration. For example, using occupancy models to infer trends in abundance is predicated on positive OA relationships. Many occupancy studies collect data that violate geographical closure assumptions due to the choice of sampling scales and application to mobile organisms, which may change how occupancy and abundance are related. Little research, however, has explored how different occupancy sampling designs affect OA relationships. We develop a conceptual framework for understanding how sampling scales affect the definition of occupancy for mobile organisms, which drives OA relationships. We explore how spatial and temporal sampling scales, and the choice of sampling unit (areal vs. point sampling), affect OA relationships. We develop predictions using simulations, and test them using empirical occupancy data from remote cameras on 11 medium-large mammals. Surprisingly, our simulations demonstrate that when using point sampling, OA relationships are unaffected by spatial sampling grain (i.e., cell size). In contrast, when using areal sampling (e.g., species atlas data), OA relationships are affected by spatial grain. Furthermore, OA relationships are also affected by temporal sampling scales, where the curvature of the OA relationship increases with temporal sampling duration. Our empirical results support these predictions, showing that at any given abundance, the spatial grain of point sampling does not affect occupancy estimates, but longer surveys do increase occupancy estimates. For rare species (low occupancy), estimates of occupancy will quickly increase with longer surveys, even while abundance remains constant. Our results also clearly demonstrate that occupancy for mobile species without geographical closure is not true occupancy. The independence of occupancy estimates from spatial sampling grain depends on the sampling unit. Point-sampling surveys can, however, provide unbiased estimates of occupancy for multiple species simultaneously, irrespective of home-range size. The use of occupancy for trend monitoring needs to explicitly articulate how the chosen sampling scales define occupancy and affect the occupancy-abundance relationship. © 2017 by the Ecological Society of America.

  3. Methods and equations for estimating aboveground volume, biomass, and carbon for trees in the U.S. forest inventory, 2010

    Treesearch

    Christopher W. Woodall; Linda S. Heath; Grant M. Domke; Michael C. Nichols

    2011-01-01

    The U.S. Forest Service, Forest Inventory and Analysis (FIA) program uses numerous models and associated coefficients to estimate aboveground volume, biomass, and carbon for live and standing dead trees for most tree species in forests of the United States. The tree attribute models are coupled with FIA's national inventory of sampled trees to produce estimates of...

  4. Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data

    ERIC Educational Resources Information Center

    Savalei, Victoria

    2010-01-01

    Maximum likelihood is the most common estimation method in structural equation modeling. Standard errors for maximum likelihood estimates are obtained from the associated information matrix, which can be estimated from the sample using either expected or observed information. It is known that, with complete data, estimates based on observed or…

  5. BayeSED: A General Approach to Fitting the Spectral Energy Distribution of Galaxies

    NASA Astrophysics Data System (ADS)

    Han, Yunkun; Han, Zhanwen

    2014-11-01

    We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large Ks -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has been performed for the first time. We found that the 2003 model by Bruzual & Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the Ks -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.

  6. BayeSED: A GENERAL APPROACH TO FITTING THE SPECTRAL ENERGY DISTRIBUTION OF GALAXIES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Han, Yunkun; Han, Zhanwen, E-mail: hanyk@ynao.ac.cn, E-mail: zhanwenhan@ynao.ac.cn

    2014-11-01

    We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large K{sub s} -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has beenmore » performed for the first time. We found that the 2003 model by Bruzual and Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the K{sub s} -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.« less

  7. Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty

    PubMed Central

    Baele, Guy; Lemey, Philippe; Suchard, Marc A.

    2016-01-01

    Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of “working distributions” to facilitate—or shorten—the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a “working” distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different “working” distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses. PMID:26526428

  8. Model Choice and Sample Size in Item Response Theory Analysis of Aphasia Tests

    ERIC Educational Resources Information Center

    Hula, William D.; Fergadiotis, Gerasimos; Martin, Nadine

    2012-01-01

    Purpose: The purpose of this study was to identify the most appropriate item response theory (IRT) measurement model for aphasia tests requiring 2-choice responses and to determine whether small samples are adequate for estimating such models. Method: Pyramids and Palm Trees (Howard & Patterson, 1992) test data that had been collected from…

  9. Precipitation and Latent Heating Distributions from Satellite Passive Microwave Radiometry. Part 1; Improved Method and Uncertainties

    NASA Technical Reports Server (NTRS)

    Olson, William S.; Kummerow, Christian D.; Yang, Song; Petty, Grant W.; Tao, Wei-Kuo; Bell, Thomas L.; Braun, Scott A.; Wang, Yansen; Lang, Stephen E.; Johnson, Daniel E.; hide

    2006-01-01

    A revised Bayesian algorithm for estimating surface rain rate, convective rain proportion, and latent heating profiles from satellite-borne passive microwave radiometer observations over ocean backgrounds is described. The algorithm searches a large database of cloud-radiative model simulations to find cloud profiles that are radiatively consistent with a given set of microwave radiance measurements. The properties of these radiatively consistent profiles are then composited to obtain best estimates of the observed properties. The revised algorithm is supported by an expanded and more physically consistent database of cloud-radiative model simulations. The algorithm also features a better quantification of the convective and nonconvective contributions to total rainfall, a new geographic database, and an improved representation of background radiances in rain-free regions. Bias and random error estimates are derived from applications of the algorithm to synthetic radiance data, based upon a subset of cloud-resolving model simulations, and from the Bayesian formulation itself. Synthetic rain-rate and latent heating estimates exhibit a trend of high (low) bias for low (high) retrieved values. The Bayesian estimates of random error are propagated to represent errors at coarser time and space resolutions, based upon applications of the algorithm to TRMM Microwave Imager (TMI) data. Errors in TMI instantaneous rain-rate estimates at 0.5 -resolution range from approximately 50% at 1 mm/h to 20% at 14 mm/h. Errors in collocated spaceborne radar rain-rate estimates are roughly 50%-80% of the TMI errors at this resolution. The estimated algorithm random error in TMI rain rates at monthly, 2.5deg resolution is relatively small (less than 6% at 5 mm day.1) in comparison with the random error resulting from infrequent satellite temporal sampling (8%-35% at the same rain rate). Percentage errors resulting from sampling decrease with increasing rain rate, and sampling errors in latent heating rates follow the same trend. Averaging over 3 months reduces sampling errors in rain rates to 6%-15% at 5 mm day.1, with proportionate reductions in latent heating sampling errors.

  10. Model-Based Design of Long-Distance Tracer Transport Experiments in Plants.

    PubMed

    Bühler, Jonas; von Lieres, Eric; Huber, Gregor J

    2018-01-01

    Studies of long-distance transport of tracer isotopes in plants offer a high potential for functional phenotyping, but so far measurement time is a bottleneck because continuous time series of at least 1 h are required to obtain reliable estimates of transport properties. Hence, usual throughput values are between 0.5 and 1 samples h -1 . Here, we propose to increase sample throughput by introducing temporal gaps in the data acquisition of each plant sample and measuring multiple plants one after each other in a rotating scheme. In contrast to common time series analysis methods, mechanistic tracer transport models allow the analysis of interrupted time series. The uncertainties of the model parameter estimates are used as a measure of how much information was lost compared to complete time series. A case study was set up to systematically investigate different experimental schedules for different throughput scenarios ranging from 1 to 12 samples h -1 . Selected designs with only a small amount of data points were found to be sufficient for an adequate parameter estimation, implying that the presented approach enables a substantial increase of sample throughput. The presented general framework for automated generation and evaluation of experimental schedules allows the determination of a maximal sample throughput and the respective optimal measurement schedule depending on the required statistical reliability of data acquired by future experiments.

  11. Mixed model approaches for diallel analysis based on a bio-model.

    PubMed

    Zhu, J; Weir, B S

    1996-12-01

    A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.

  12. Inventory implications of using sampling variances in estimation of growth model coefficients

    Treesearch

    Albert R. Stage; William R. Wykoff

    2000-01-01

    Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...

  13. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

  14. Statistical theory and methodology for remote sensing data analysis

    NASA Technical Reports Server (NTRS)

    Odell, P. L.

    1974-01-01

    A model is developed for the evaluation of acreages (proportions) of different crop-types over a geographical area using a classification approach and methods for estimating the crop acreages are given. In estimating the acreages of a specific croptype such as wheat, it is suggested to treat the problem as a two-crop problem: wheat vs. nonwheat, since this simplifies the estimation problem considerably. The error analysis and the sample size problem is investigated for the two-crop approach. Certain numerical results for sample sizes are given for a JSC-ERTS-1 data example on wheat identification performance in Hill County, Montana and Burke County, North Dakota. Lastly, for a large area crop acreages inventory a sampling scheme is suggested for acquiring sample data and the problem of crop acreage estimation and the error analysis is discussed.

  15. Variance of discharge estimates sampled using acoustic Doppler current profilers from moving boats

    USGS Publications Warehouse

    Garcia, Carlos M.; Tarrab, Leticia; Oberg, Kevin; Szupiany, Ricardo; Cantero, Mariano I.

    2012-01-01

    This paper presents a model for quantifying the random errors (i.e., variance) of acoustic Doppler current profiler (ADCP) discharge measurements from moving boats for different sampling times. The model focuses on the random processes in the sampled flow field and has been developed using statistical methods currently available for uncertainty analysis of velocity time series. Analysis of field data collected using ADCP from moving boats from three natural rivers of varying sizes and flow conditions shows that, even though the estimate of the integral time scale of the actual turbulent flow field is larger than the sampling interval, the integral time scale of the sampled flow field is on the order of the sampling interval. Thus, an equation for computing the variance error in discharge measurements associated with different sampling times, assuming uncorrelated flow fields is appropriate. The approach is used to help define optimal sampling strategies by choosing the exposure time required for ADCPs to accurately measure flow discharge.

  16. Weighted regression analysis and interval estimators

    Treesearch

    Donald W. Seegrist

    1974-01-01

    A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.

  17. Modeling returns volatility: Realized GARCH incorporating realized risk measure

    NASA Astrophysics Data System (ADS)

    Jiang, Wei; Ruan, Qingsong; Li, Jianfeng; Li, Ye

    2018-06-01

    This study applies realized GARCH models by introducing several risk measures of intraday returns into the measurement equation, to model the daily volatility of E-mini S&P 500 index futures returns. Besides using the conventional realized measures, realized volatility and realized kernel as our benchmarks, we also use generalized realized risk measures, realized absolute deviation, and two realized tail risk measures, realized value-at-risk and realized expected shortfall. The empirical results show that realized GARCH models using the generalized realized risk measures provide better volatility estimation for the in-sample and substantial improvement in volatility forecasting for the out-of-sample. In particular, the realized expected shortfall performs best for all of the alternative realized measures. Our empirical results reveal that future volatility may be more attributable to present losses (risk measures). The results are robust to different sample estimation windows.

  18. Modeling the distribution of colonial species to improve estimation of plankton concentration in ballast water

    NASA Astrophysics Data System (ADS)

    Rajakaruna, Harshana; VandenByllaardt, Julie; Kydd, Jocelyn; Bailey, Sarah

    2018-03-01

    The International Maritime Organization (IMO) has set limits on allowable plankton concentrations in ballast water discharge to minimize aquatic invasions globally. Previous guidance on ballast water sampling and compliance decision thresholds was based on the assumption that probability distributions of plankton are Poisson when spatially homogenous, or negative binomial when heterogeneous. We propose a hierarchical probability model, which incorporates distributions at the level of particles (i.e., discrete individuals plus colonies per unit volume) and also within particles (i.e., individuals per particle) to estimate the average plankton concentration in ballast water. We examined the performance of the models using data for plankton in the size class ≥ 10 μm and < 50 μm, collected from five different depths of a ballast tank of a commercial ship in three independent surveys. We show that the data fit to the negative binomial and the hierarchical probability models equally well, with both models performing better than the Poisson model at the scale of our sampling. The hierarchical probability model, which accounts for both the individuals and the colonies in a sample, reduces the uncertainty associated with the concentration estimation, and improves the power of rejecting the decision on ship's compliance when a ship does not truly comply with the standard. We show examples of how to test ballast water compliance using the above models.

  19. Evaluating the Impact of Genomic Data and Priors on Bayesian Estimates of the Angiosperm Evolutionary Timescale.

    PubMed

    Foster, Charles S P; Sauquet, Hervê; van der Merwe, Marlien; McPherson, Hannah; Rossetto, Maurizio; Ho, Simon Y W

    2017-05-01

    The evolutionary timescale of angiosperms has long been a key question in biology. Molecular estimates of this timescale have shown considerable variation, being influenced by differences in taxon sampling, gene sampling, fossil calibrations, evolutionary models, and choices of priors. Here, we analyze a data set comprising 76 protein-coding genes from the chloroplast genomes of 195 taxa spanning 86 families, including novel genome sequences for 11 taxa, to evaluate the impact of models, priors, and gene sampling on Bayesian estimates of the angiosperm evolutionary timescale. Using a Bayesian relaxed molecular-clock method, with a core set of 35 minimum and two maximum fossil constraints, we estimated that crown angiosperms arose 221 (251-192) Ma during the Triassic. Based on a range of additional sensitivity and subsampling analyses, we found that our date estimates were generally robust to large changes in the parameters of the birth-death tree prior and of the model of rate variation across branches. We found an exception to this when we implemented fossil calibrations in the form of highly informative gamma priors rather than as uniform priors on node ages. Under all other calibration schemes, including trials of seven maximum age constraints, we consistently found that the earliest divergences of angiosperm clades substantially predate the oldest fossils that can be assigned unequivocally to their crown group. Overall, our results and experiments with genome-scale data suggest that reliable estimates of the angiosperm crown age will require increased taxon sampling, significant methodological changes, and new information from the fossil record. [Angiospermae, chloroplast, genome, molecular dating, Triassic.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Uncertainty in predicting soil hydraulic properties at the hillslope scale with indirect methods

    NASA Astrophysics Data System (ADS)

    Chirico, G. B.; Medina, H.; Romano, N.

    2007-02-01

    SummarySeveral hydrological applications require the characterisation of the soil hydraulic properties at large spatial scales. Pedotransfer functions (PTFs) are being developed as simplified methods to estimate soil hydraulic properties as an alternative to direct measurements, which are unfeasible for most practical circumstances. The objective of this study is to quantify the uncertainty in PTFs spatial predictions at the hillslope scale as related to the sampling density, due to: (i) the error in estimated soil physico-chemical properties and (ii) PTF model error. The analysis is carried out on a 2-km-long experimental hillslope in South Italy. The method adopted is based on a stochastic generation of patterns of soil variables using sequential Gaussian simulation, conditioned to the observed sample data. The following PTFs are applied: Vereecken's PTF [Vereecken, H., Diels, J., van Orshoven, J., Feyen, J., Bouma, J., 1992. Functional evaluation of pedotransfer functions for the estimation of soil hydraulic properties. Soil Sci. Soc. Am. J. 56, 1371-1378] and HYPRES PTF [Wösten, J.H.M., Lilly, A., Nemes, A., Le Bas, C., 1999. Development and use of a database of hydraulic properties of European soils. Geoderma 90, 169-185]. The two PTFs estimate reliably the soil water retention characteristic even for a relatively coarse sampling resolution, with prediction uncertainties comparable to the uncertainties in direct laboratory or field measurements. The uncertainty of soil water retention prediction due to the model error is as much as or more significant than the uncertainty associated with the estimated input, even for a relatively coarse sampling resolution. Prediction uncertainties are much more important when PTF are applied to estimate the saturated hydraulic conductivity. In this case model error dominates the overall prediction uncertainties, making negligible the effect of the input error.

  1. Atmospheric aerosol source identification and estimates of source contributions to air pollution in Dundee, UK

    NASA Astrophysics Data System (ADS)

    Qin, Y.; Oduyemi, K.

    Anthropogenic aerosol (PM 10) emission sources sampled at an air quality monitoring station in Dundee have been analysed. However, the information on local natural aerosol emission sources was unavailable. A method that combines receptor model and atmospheric dispersion model was used to identify aerosol sources and estimate source contributions to air pollution. The receptor model identified five sources. These are aged marine aerosol source with some chlorine replaced by sulphate, secondary aerosol source of ammonium sulphate, secondary aerosol source of ammonium nitrate, soil and construction dust source, and incinerator and fuel oil burning emission source. For the vehicle emission source, which has been comprehensively described in the atmospheric emission inventory but cannot be identified by the receptor model, an atmospheric dispersion model was used to estimate its contributions. In Dundee, a significant percentage, 67.5%, of the aerosol mass sampled at the study station could be attributed to the six sources named above.

  2. Bayesian hierarchical model of ceftriaxone resistance proportions among Salmonella serotype Heidelberg infections.

    PubMed

    Gu, Weidong; Medalla, Felicita; Hoekstra, Robert M

    2018-02-01

    The National Antimicrobial Resistance Monitoring System (NARMS) at the Centers for Disease Control and Prevention tracks resistance among Salmonella infections. The annual number of Salmonella isolates of a particular serotype from states may be small, making direct estimation of resistance proportions unreliable. We developed a Bayesian hierarchical model to improve estimation by borrowing strength from relevant sampling units. We illustrate the models with different specifications of spatio-temporal interaction using 2004-2013 NARMS data for ceftriaxone-resistant Salmonella serotype Heidelberg. Our results show that Bayesian estimates of resistance proportions were smoother than observed values, and the difference between predicted and observed proportions was inversely related to the number of submitted isolates. The model with interaction allowed for tracking of annual changes in resistance proportions at the state level. We demonstrated that Bayesian hierarchical models provide a useful tool to examine spatio-temporal patterns of small sample size such as those found in NARMS. Published by Elsevier Ltd.

  3. A novel application of artificial neural network for wind speed estimation

    NASA Astrophysics Data System (ADS)

    Fang, Da; Wang, Jianzhou

    2017-05-01

    Providing accurate multi-steps wind speed estimation models has increasing significance, because of the important technical and economic impacts of wind speed on power grid security and environment benefits. In this study, the combined strategies for wind speed forecasting are proposed based on an intelligent data processing system using artificial neural network (ANN). Generalized regression neural network and Elman neural network are employed to form two hybrid models. The approach employs one of ANN to model the samples achieving data denoising and assimilation and apply the other to predict wind speed using the pre-processed samples. The proposed method is demonstrated in terms of the predicting improvements of the hybrid models compared with single ANN and the typical forecasting method. To give sufficient cases for the study, four observation sites with monthly average wind speed of four given years in Western China were used to test the models. Multiple evaluation methods demonstrated that the proposed method provides a promising alternative technique in monthly average wind speed estimation.

  4. Sample selection and spatial models of housing price indexes, and, A disequilibrium analysis of the U.S. gasoline market using panel data

    NASA Astrophysics Data System (ADS)

    Hu, Haixin

    This dissertation consists of two parts. The first part studies the sample selection and spatial models of housing price index using transaction data on detached single-family houses of two California metropolitan areas from 1990 through 2008. House prices are often spatially correlated due to shared amenities, or when the properties are viewed as close substitutes in a housing submarket. There have been many studies that address spatial correlation in the context of housing markets. However, none has used spatial models to construct housing price indexes at zip code level for the entire time period analyzed in this dissertation to the best of my knowledge. In this paper, I study a first-order autoregressive spatial model with four different weighing matrix schemes. Four sets of housing price indexes are constructed accordingly. Gatzlaff and Haurin (1997, 1998) study the sample selection problem in housing index by using Heckman's two-step method. This method, however, is generally inefficient and can cause multicollinearity problem. Also, it requires data on unsold houses in order to carry out the first-step probit regression. Maximum likelihood (ML) method can be used to estimate a truncated incidental model which allows one to correct for sample selection based on transaction data only. However, convergence problem is very prevalent in practice. In this paper I adopt Lewbel's (2007) sample selection correction method which does not require one to model or estimate the selection model, except for some very general assumptions. I then extend this method to correct for spatial correlation. In the second part, I analyze the U.S. gasoline market with a disequilibrium model that allows lagged-latent variables, endogenous prices, and panel data with fixed effects. Most existing studies (see the survey of Espey, 1998, Energy Economics) of the gasoline market assume equilibrium. In practice, however, prices do not always adjust fast enough to clear the market. Equilibrium assumptions greatly simplify statistical inference, but are very restrictive and can produce conflicting estimates. For example, econometric models of markets that assume equilibrium often produce more elastic demand price elasticity than their disequilibrium counterparts (Holt and Johnson, 1989, Review of Economics and Statistics, Oczkowski, 1998, Economics Letters). The few studies that allow disequilibrium, however, have been limited to macroeconomic time-series data without lagged-latent variables. While time series data allows one to investigate national trends, it cannot be used to identify and analyze regional differences and the role of local markets. Exclusion of the lagged-latent variables is also undesirable because such variables capture adjustment costs and inter-temporal spillovers. Simulation methods offer tractable solutions to dynamic and panel data disequilibrium models (Lee, 1997, Journal of Econometrics), but assume normally distributed errors. This paper compares estimates of price/income elasticity and excess supply/demand across time periods, regions, and model specifications, using both equilibrium and disequilibrium methods. In the equilibrium model, I compare the within group estimator with Anderson and Hsiao's first-difference 2SLS estimator. In the disequilibrium model, I extend Amemiya's 2SLS by using Newey's efficient estimator with optimal instruments.

  5. A Bayesian state-space formulation of dynamic occupancy models

    USGS Publications Warehouse

    Royle, J. Andrew; Kery, M.

    2007-01-01

    Species occurrence and its dynamic components, extinction and colonization probabilities, are focal quantities in biogeography and metapopulation biology, and for species conservation assessments. It has been increasingly appreciated that these parameters must be estimated separately from detection probability to avoid the biases induced by nondetection error. Hence, there is now considerable theoretical and practical interest in dynamic occupancy models that contain explicit representations of metapopulation dynamics such as extinction, colonization, and turnover as well as growth rates. We describe a hierarchical parameterization of these models that is analogous to the state-space formulation of models in time series, where the model is represented by two components, one for the partially observable occupancy process and another for the observations conditional on that process. This parameterization naturally allows estimation of all parameters of the conventional approach to occupancy models, but in addition, yields great flexibility and extensibility, e.g., to modeling heterogeneity or latent structure in model parameters. We also highlight the important distinction between population and finite sample inference; the latter yields much more precise estimates for the particular sample at hand. Finite sample estimates can easily be obtained using the state-space representation of the model but are difficult to obtain under the conventional approach of likelihood-based estimation. We use R and Win BUGS to apply the model to two examples. In a standard analysis for the European Crossbill in a large Swiss monitoring program, we fit a model with year-specific parameters. Estimates of the dynamic parameters varied greatly among years, highlighting the irruptive population dynamics of that species. In the second example, we analyze route occupancy of Cerulean Warblers in the North American Breeding Bird Survey (BBS) using a model allowing for site-specific heterogeneity in model parameters. The results indicate relatively low turnover and a stable distribution of Cerulean Warblers which is in contrast to analyses of counts of individuals from the same survey that indicate important declines. This discrepancy illustrates the inertia in occupancy relative to actual abundance. Furthermore, the model reveals a declining patch survival probability, and increasing turnover, toward the edge of the range of the species, which is consistent with metapopulation perspectives on the genesis of range edges. Given detection/non-detection data, dynamic occupancy models as described here have considerable potential for the study of distributions and range dynamics.

  6. Empirical Bayes estimation of undercount in the decennial census.

    PubMed

    Cressie, N

    1989-12-01

    Empirical Bayes methods are used to estimate the extent of the undercount at the local level in the 1980 U.S. census. "Grouping of like subareas from areas such as states, counties, and so on into strata is a useful way of reducing the variance of undercount estimators. By modeling the subareas within a stratum to have a common mean and variances inversely proportional to their census counts, and by taking into account sampling of the areas (e.g., by dual-system estimation), empirical Bayes estimators that compromise between the (weighted) stratum average and the sample value can be constructed. The amount of compromise is shown to depend on the relative importance of stratum variance to sampling variance. These estimators are evaluated at the state level (51 states, including Washington, D.C.) and stratified on race/ethnicity (3 strata) using data from the 1980 postenumeration survey (PEP 3-8, for the noninstitutional population)." excerpt

  7. Estimation of sex and stature using anthropometry of the upper extremity in an Australian population.

    PubMed

    Howley, Donna; Howley, Peter; Oxenham, Marc F

    2018-06-01

    Stature and a further 8 anthropometric dimensions were recorded from the arms and hands of a sample of 96 staff and students from the Australian National University and The University of Newcastle, Australia. These dimensions were used to create simple and multiple logistic regression models for sex estimation and simple and multiple linear regression equations for stature estimation of a contemporary Australian population. Overall sex classification accuracies using the models created were comparable to similar studies. The stature estimation models achieved standard errors of estimates (SEE) which were comparable to and in many cases lower than those achieved in similar research. Generic, non sex-specific models achieved similar SEEs and R 2 values to the sex-specific models indicating stature may be accurately estimated when sex is unknown. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. A pharmacometric case study regarding the sensitivity of structural model parameter estimation to error in patient reported dosing times.

    PubMed

    Knights, Jonathan; Rohatagi, Shashank

    2015-12-01

    Although there is a body of literature focused on minimizing the effect of dosing inaccuracies on pharmacokinetic (PK) parameter estimation, most of the work centers on missing doses. No attempt has been made to specifically characterize the effect of error in reported dosing times. Additionally, existing work has largely dealt with cases in which the compound of interest is dosed at an interval no less than its terminal half-life. This work provides a case study investigating how error in patient reported dosing times might affect the accuracy of structural model parameter estimation under sparse sampling conditions when the dosing interval is less than the terminal half-life of the compound, and the underlying kinetics are monoexponential. Additional effects due to noncompliance with dosing events are not explored and it is assumed that the structural model and reasonable initial estimates of the model parameters are known. Under the conditions of our simulations, with structural model CV % ranging from ~20 to 60 %, parameter estimation inaccuracy derived from error in reported dosing times was largely controlled around 10 % on average. Given that no observed dosing was included in the design and sparse sampling was utilized, we believe these error results represent a practical ceiling given the variability and parameter estimates for the one-compartment model. The findings suggest additional investigations may be of interest and are noteworthy given the inability of current PK software platforms to accommodate error in dosing times.

  9. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model.

    PubMed

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P; Zhou, Haibo

    2016-11-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.

  10. Generalized SAMPLE SIZE Determination Formulas for Investigating Contextual Effects by a Three-Level Random Intercept Model.

    PubMed

    Usami, Satoshi

    2017-03-01

    Behavioral and psychological researchers have shown strong interests in investigating contextual effects (i.e., the influences of combinations of individual- and group-level predictors on individual-level outcomes). The present research provides generalized formulas for determining the sample size needed in investigating contextual effects according to the desired level of statistical power as well as width of confidence interval. These formulas are derived within a three-level random intercept model that includes one predictor/contextual variable at each level to simultaneously cover various kinds of contextual effects that researchers can show interest. The relative influences of indices included in the formulas on the standard errors of contextual effects estimates are investigated with the aim of further simplifying sample size determination procedures. In addition, simulation studies are performed to investigate finite sample behavior of calculated statistical power, showing that estimated sample sizes based on derived formulas can be both positively and negatively biased due to complex effects of unreliability of contextual variables, multicollinearity, and violation of assumption regarding the known variances. Thus, it is advisable to compare estimated sample sizes under various specifications of indices and to evaluate its potential bias, as illustrated in the example.

  11. Inventory of forest resources (including water) by multi-level sampling. [nine northern Virginia coastal plain counties

    NASA Technical Reports Server (NTRS)

    Aldrich, R. C.; Dana, R. W.; Roberts, E. H. (Principal Investigator)

    1977-01-01

    The author has identified the following significant results. A stratified random sample using LANDSAT band 5 and 7 panchromatic prints resulted in estimates of water in counties with sampling errors less than + or - 9% (67% probability level). A forest inventory using a four band LANDSAT color composite resulted in estimates of forest area by counties that were within + or - 6.7% and + or - 3.7% respectively (67% probability level). Estimates of forest area for counties by computer assisted techniques were within + or - 21% of operational forest survey figures and for all counties the difference was only one percent. Correlations of airborne terrain reflectance measurements with LANDSAT radiance verified a linear atmospheric model with an additive (path radiance) term and multiplicative (transmittance) term. Coefficients of determination for 28 of the 32 modeling attempts, not adverseley affected by rain shower occurring between the times of LANDSAT passage and aircraft overflights, exceeded 0.83.

  12. Joint Inference of Population Assignment and Demographic History

    PubMed Central

    Choi, Sang Chul; Hey, Jody

    2011-01-01

    A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy–Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets. PMID:21775468

  13. Intellectual Development within Transracial Adoptive Families: Retesting the Confluence Model.

    ERIC Educational Resources Information Center

    Berbaum, Michael L.; Moreland, Richard L.

    1985-01-01

    Estimates confluence model of intellectual development for a within-family sample of 321 children from 101 transracial adoptive families. Mental ages of children and their parents and birth or adoption intervals were used in a nonlinear least-squares estimation procedure to obtain children's predicted mental ages. Results suggest efficiency of the…

  14. Model-based estimation of individual fitness

    USGS Publications Warehouse

    Link, W.A.; Cooch, E.G.; Cam, E.

    2002-01-01

    Fitness is the currency of natural selection, a measure of the propagation rate of genotypes into future generations. Its various definitions have the common feature that they are functions of survival and fertility rates. At the individual level, the operative level for natural selection, these rates must be understood as latent features, genetically determined propensities existing at birth. This conception of rates requires that individual fitness be defined and estimated by consideration of the individual in a modelled relation to a group of similar individuals; the only alternative is to consider a sample of size one, unless a clone of identical individuals is available. We present hierarchical models describing individual heterogeneity in survival and fertility rates and allowing for associations between these rates at the individual level. We apply these models to an analysis of life histories of Kittiwakes (Rissa tridactyla) observed at several colonies on the Brittany coast of France. We compare Bayesian estimation of the population distribution of individual fitness with estimation based on treating individual life histories in isolation, as samples of size one (e.g. McGraw and Caswell, 1996).

  15. Estimating population trends with a linear model

    USGS Publications Warehouse

    Bart, Jonathan; Collins, Brian D.; Morrison, R.I.G.

    2003-01-01

    We describe a simple and robust method for estimating trends in population size. The method may be used with Breeding Bird Survey data, aerial surveys, point counts, or any other program of repeated surveys at permanent locations. Surveys need not be made at each location during each survey period. The method differs from most existing methods in being design based, rather than model based. The only assumptions are that the nominal sampling plan is followed and that sample size is large enough for use of the t-distribution. Simulations based on two bird data sets from natural populations showed that the point estimate produced by the linear model was essentially unbiased even when counts varied substantially and 25% of the complete data set was missing. The estimating-equation approach, often used to analyze Breeding Bird Survey data, performed similarly on one data set but had substantial bias on the second data set, in which counts were highly variable. The advantages of the linear model are its simplicity, flexibility, and that it is self-weighting. A user-friendly computer program to carry out the calculations is available from the senior author.

  16. Model-based estimation of individual fitness

    USGS Publications Warehouse

    Link, W.A.; Cooch, E.G.; Cam, E.

    2002-01-01

    Fitness is the currency of natural selection, a measure of the propagation rate of genotypes into future generations. Its various definitions have the common feature that they are functions of survival and fertility rates. At the individual level, the operative level for natural selection, these rates must be understood as latent features, genetically determined propensities existing at birth. This conception of rates requires that individual fitness be defined and estimated by consideration of the individual in a modelled relation to a group of similar individuals; the only alternative is to consider a sample of size one, unless a clone of identical individuals is available. We present hierarchical models describing individual heterogeneity in survival and fertility rates and allowing for associations between these rates at the individual level. We apply these models to an analysis of life histories of Kittiwakes (Rissa tridactyla ) observed at several colonies on the Brittany coast of France. We compare Bayesian estimation of the population distribution of individual fitness with estimation based on treating individual life histories in isolation, as samples of size one (e.g. McGraw & Caswell, 1996).

  17. Sampling hazelnuts for aflatoxin: uncertainty associated with sampling, sample preparation, and analysis.

    PubMed

    Ozay, Guner; Seyhan, Ferda; Yilmaz, Aysun; Whitaker, Thomas B; Slate, Andrew B; Giesbrecht, Francis

    2006-01-01

    The variability associated with the aflatoxin test procedure used to estimate aflatoxin levels in bulk shipments of hazelnuts was investigated. Sixteen 10 kg samples of shelled hazelnuts were taken from each of 20 lots that were suspected of aflatoxin contamination. The total variance associated with testing shelled hazelnuts was estimated and partitioned into sampling, sample preparation, and analytical variance components. Each variance component increased as aflatoxin concentration (either B1 or total) increased. With the use of regression analysis, mathematical expressions were developed to model the relationship between aflatoxin concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific aflatoxin concentration. The sampling, sample preparation, and analytical variances associated with estimating aflatoxin in a hazelnut lot at a total aflatoxin level of 10 ng/g and using a 10 kg sample, a 50 g subsample, dry comminution with a Robot Coupe mill, and a high-performance liquid chromatographic analytical method are 174.40, 0.74, and 0.27, respectively. The sampling, sample preparation, and analytical steps of the aflatoxin test procedure accounted for 99.4, 0.4, and 0.2% of the total variability, respectively.

  18. Comparing basal area growth models, consistency of parameters, and accuracy of prediction

    Treesearch

    J.J. Colbert; Michael Schuckers; Desta Fekedulegn

    2002-01-01

    We fit alternative sigmoid growth models to sample tree basal area historical data derived from increment cores and disks taken at breast height. We examine and compare the estimated parameters for these models across a range of sample sites. Models are rated on consistency of parameters and on their ability to fit growth data from four sites that are located across a...

  19. Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modeling, a guide for disturbance distribution selection and parameterization

    NASA Astrophysics Data System (ADS)

    Pakyuz-Charrier, Evren; Lindsay, Mark; Ogarko, Vitaliy; Giraud, Jeremie; Jessell, Mark

    2018-04-01

    Three-dimensional (3-D) geological structural modeling aims to determine geological information in a 3-D space using structural data (foliations and interfaces) and topological rules as inputs. This is necessary in any project in which the properties of the subsurface matters; they express our understanding of geometries in depth. For that reason, 3-D geological models have a wide range of practical applications including but not restricted to civil engineering, the oil and gas industry, the mining industry, and water management. These models, however, are fraught with uncertainties originating from the inherent flaws of the modeling engines (working hypotheses, interpolator's parameterization) and the inherent lack of knowledge in areas where there are no observations combined with input uncertainty (observational, conceptual and technical errors). Because 3-D geological models are often used for impactful decision-making it is critical that all 3-D geological models provide accurate estimates of uncertainty. This paper's focus is set on the effect of structural input data measurement uncertainty propagation in implicit 3-D geological modeling. This aim is achieved using Monte Carlo simulation for uncertainty estimation (MCUE), a stochastic method which samples from predefined disturbance probability distributions that represent the uncertainty of the original input data set. MCUE is used to produce hundreds to thousands of altered unique data sets. The altered data sets are used as inputs to produce a range of plausible 3-D models. The plausible models are then combined into a single probabilistic model as a means to propagate uncertainty from the input data to the final model. In this paper, several improved methods for MCUE are proposed. The methods pertain to distribution selection for input uncertainty, sample analysis and statistical consistency of the sampled distribution. Pole vector sampling is proposed as a more rigorous alternative than dip vector sampling for planar features and the use of a Bayesian approach to disturbance distribution parameterization is suggested. The influence of incorrect disturbance distributions is discussed and propositions are made and evaluated on synthetic and realistic cases to address the sighted issues. The distribution of the errors of the observed data (i.e., scedasticity) is shown to affect the quality of prior distributions for MCUE. Results demonstrate that the proposed workflows improve the reliability of uncertainty estimation and diminish the occurrence of artifacts.

  20. Comparison and continuous estimates of fecal coliform and Escherichia coli bacteria in selected Kansas streams, May 1999 through April 2002

    USGS Publications Warehouse

    Rasmussen, Patrick P.; Ziegler, Andrew C.

    2003-01-01

    The sanitary quality of water and its use as a public-water supply and for recreational activities, such as swimming, wading, boating, and fishing, can be evaluated on the basis of fecal coliform and Escherichia coli (E. coli) bacteria densities. This report describes the overall sanitary quality of surface water in selected Kansas streams, the relation between fecal coliform and E. coli, the relation between turbidity and bacteria densities, and how continuous bacteria estimates can be used to evaluate the water-quality conditions in selected Kansas streams. Samples for fecal coliform and E. coli were collected at 28 surface-water sites in Kansas. Of the 318 samples collected, 18 percent exceeded the current Kansas Department of Health and Environment (KDHE) secondary contact recreational, single-sample criterion for fecal coliform (2,000 colonies per 100 milliliters of water). Of the 219 samples collected during the recreation months (April 1 through October 31), 21 percent exceeded the current (2003) KDHE single-sample fecal coliform criterion for secondary contact rec-reation (2,000 colonies per 100 milliliters of water) and 36 percent exceeded the U.S. Environmental Protection Agency (USEPA) recommended single-sample primary contact recreational criterion for E. coli (576 colonies per 100 milliliters of water). Comparisons of fecal coliform and E. coli criteria indicated that more than one-half of the streams sampled could exceed USEPA recommended E. coli criteria more frequently than the current KDHE fecal coliform criteria. In addition, the ratios of E. coli to fecal coliform (EC/FC) were smallest for sites with slightly saline water (specific conductance greater than 1,000 microsiemens per centimeter at 25 degrees Celsius), indicating that E. coli may not be a good indicator of sanitary quality for those streams. Enterococci bacteria may provide a more accurate assessment of the potential for swimming-related illnesses in these streams. Ratios of EC/FC and linear regression models were developed for estimating E. coli densities on the basis of measured fecal coliform densities for six individual and six groups of surface-water sites. Regression models developed for the six individual surface-water sites and six groups of sites explain at least 89 percent of the variability in E. coli densities. The EC/FC ratios and regression models are site specific and make it possible to convert historic fecal coliform bacteria data to estimated E. coli densities for the selected sites. The EC/FC ratios can be used to estimate E. coli for any range of historical fecal coliform densities, and in some cases with less error than the regression models. The basin- and statewide regression models explained at least 93 percent of the variance and best represent the sites where a majority of the data used to develop the models were collected (Kansas and Little Arkansas Basins). Comparison of the current (2003) KDHE geometric-mean primary contact criterion for fecal coliform bacteria of 200 col/100 mL to the 2002 USEPA recommended geometric-mean criterion of 126 col/100 mL for E. coli results in an EC/FC ratio of 0.63. The geometric-mean EC/FC ratio for all sites except Rattlesnake Creek (site 21) is 0.77, indicating that considerably more than 63 percent of the fecal coliform is E. coli. This potentially could lead to more exceedances of the recommended E. coli criterion, where the water now meets the current (2003) 200-col/100 mL fecal coliform criterion. In this report, turbidity was found to be a reliable estimator of bacteria densities. Regression models are provided for estimating fecal coliform and E. coli bacteria densities using continuous turbidity measurements. Prediction intervals also are provided to show the uncertainty associated with using the regression models. Eighty percent of all measured sample densities and individual turbidity-based estimates from the regression models were in agreement as exceedi

  1. Inverse Analysis of Irradiated NuclearMaterial Gamma Spectra via Nonlinear Optimization

    NASA Astrophysics Data System (ADS)

    Dean, Garrett James

    Nuclear forensics is the collection of technical methods used to identify the provenance of nuclear material interdicted outside of regulatory control. Techniques employed in nuclear forensics include optical microscopy, gas chromatography, mass spectrometry, and alpha, beta, and gamma spectrometry. This dissertation focuses on the application of inverse analysis to gamma spectroscopy to estimate the history of pulse irradiated nuclear material. Previous work in this area has (1) utilized destructive analysis techniques to supplement the nondestructive gamma measurements, and (2) been applied to samples composed of spent nuclear fuel with long irradiation and cooling times. Previous analyses have employed local nonlinear solvers, simple empirical models of gamma spectral features, and simple detector models of gamma spectral features. The algorithm described in this dissertation uses a forward model of the irradiation and measurement process within a global nonlinear optimizer to estimate the unknown irradiation history of pulse irradiated nuclear material. The forward model includes a detector response function for photopeaks only. The algorithm uses a novel hybrid global and local search algorithm to quickly estimate the irradiation parameters, including neutron fluence, cooling time and original composition. Sequential, time correlated series of measurements are used to reduce the uncertainty in the estimated irradiation parameters. This algorithm allows for in situ measurements of interdicted irradiated material. The increase in analysis speed comes with a decrease in information that can be determined, but the sample fluence, cooling time, and composition can be determined within minutes of a measurement. Furthermore, pulse irradiated nuclear material has a characteristic feature that irradiation time and flux cannot be independently estimated. The algorithm has been tested against pulse irradiated samples of pure special nuclear material with cooling times of four minutes to seven hours. The algorithm described is capable of determining the cooling time and fluence the sample was exposed to within 10% as well as roughly estimating the relative concentrations of nuclides present in the original composition.

  2. Investigations of potential bias in the estimation of lambda using Pradel's (1996) model for capture-recapture data

    USGS Publications Warehouse

    Hines, J.E.; Nichols, J.D.

    2002-01-01

    Pradel's (1996) temporal symmetry model permitting direct estimation and modelling of population growth rate, lambda sub i provides a potentially useful tool for the study of population dynamics using marked animals. Because of its recent publication date, the approach has not seen much use, and there have been virtually no investigations directed at robustness of the resulting estimators. Here we consider several potential sources of bias, all motivated by specific uses of this estimation approach. We consider sampling situations in which the study area expands with time and present an analytic expression for the bias in lambda hat sub i. We next consider trap response in capture probabilities and heterogeneous capture probabilities and compute large-sample and simulation-based approximations of resulting bias in lambda hat sub i. These approximations indicate that trap response is an especially important assumption violation that can produce substantial bias. Finally, we consider losses on capture and emphasize the importance of selecting the estimator for lambda sub i that is appropriate to the question being addressed. For studies based on only sighting and resighting data, Pradel's (1996) lambda hat prime sub i is the appropriate estimator.

  3. Model Reduction via Principe Component Analysis and Markov Chain Monte Carlo (MCMC) Methods

    NASA Astrophysics Data System (ADS)

    Gong, R.; Chen, J.; Hoversten, M. G.; Luo, J.

    2011-12-01

    Geophysical and hydrogeological inverse problems often include a large number of unknown parameters, ranging from hundreds to millions, depending on parameterization and problems undertaking. This makes inverse estimation and uncertainty quantification very challenging, especially for those problems in two- or three-dimensional spatial domains. Model reduction technique has the potential of mitigating the curse of dimensionality by reducing total numbers of unknowns while describing the complex subsurface systems adequately. In this study, we explore the use of principal component analysis (PCA) and Markov chain Monte Carlo (MCMC) sampling methods for model reduction through the use of synthetic datasets. We compare the performances of three different but closely related model reduction approaches: (1) PCA methods with geometric sampling (referred to as 'Method 1'), (2) PCA methods with MCMC sampling (referred to as 'Method 2'), and (3) PCA methods with MCMC sampling and inclusion of random effects (referred to as 'Method 3'). We consider a simple convolution model with five unknown parameters as our goal is to understand and visualize the advantages and disadvantages of each method by comparing their inversion results with the corresponding analytical solutions. We generated synthetic data with noise added and invert them under two different situations: (1) the noised data and the covariance matrix for PCA analysis are consistent (referred to as the unbiased case), and (2) the noise data and the covariance matrix are inconsistent (referred to as biased case). In the unbiased case, comparison between the analytical solutions and the inversion results show that all three methods provide good estimates of the true values and Method 1 is computationally more efficient. In terms of uncertainty quantification, Method 1 performs poorly because of relatively small number of samples obtained, Method 2 performs best, and Method 3 overestimates uncertainty due to inclusion of random effects. However, in the biased case, only Method 3 correctly estimates all the unknown parameters, and both Methods 1 and 2 provide wrong values for the biased parameters. The synthetic case study demonstrates that if the covariance matrix for PCA analysis is inconsistent with true models, the PCA methods with geometric or MCMC sampling will provide incorrect estimates.

  4. ERTS data user investigation to develop a multistage forest sampling inventory system

    NASA Technical Reports Server (NTRS)

    Langley, P. G.; Vanroessel, J. W. (Principal Investigator)

    1973-01-01

    The author has identified the following significant results. A unique digital timber volume estimation system was developed for use with the MSS CCT tapes. The system was tested on a 64-square mile area in Northern California's Trinity Alps. The outcome of a systematic experiment, in which several possible combinations of bands 5 and 7 and a contrast measure were tried, showed that an estimated gain in precision of 50% can be obtained in a multistage sampling design. The difference between bands 5 and 7 proved to be of special importance for the estimation of biomass in the form of timber volume. In addition, an interpretation model for high flight U2 photographs was developed. A maximum multiple correlation coefficient of 0.74 was obtained for the regression model, explaining 55% of the variation in timber volume as estimated from aerial photos and ground measurments. An interpretation model for MSS color composites is in the testing stage.

  5. Space-Time Smoothing of Complex Survey Data: Small Area Estimation for Child Mortality

    PubMed Central

    Mercer, Laina D; Wakefield, Jon; Pantazis, Athena; Lutambi, Angelina M; Masanja, Honorati; Clark, Samuel

    2016-01-01

    Many people living in low and middle-income countries are not covered by civil registration and vital statistics systems. Consequently, a wide variety of other types of data including many household sample surveys are used to estimate health and population indicators. In this paper we combine data from sample surveys and demographic surveillance systems to produce small area estimates of child mortality through time. Small area estimates are necessary to understand geographical heterogeneity in health indicators when full-coverage vital statistics are not available. For this endeavor spatio-temporal smoothing is beneficial to alleviate problems of data sparsity. The use of conventional hierarchical models requires careful thought since the survey weights may need to be considered to alleviate bias due to non-random sampling and non-response. The application that motivated this work is estimation of child mortality rates in five-year time intervals in regions of Tanzania. Data come from Demographic and Health Surveys conducted over the period 1991–2010 and two demographic surveillance system sites. We derive a variance estimator of under five years child mortality that accounts for the complex survey weighting. For our application, the hierarchical models we consider include random effects for area, time and survey and we compare models using a variety of measures including the conditional predictive ordinate (CPO). The method we propose is implemented via the fast and accurate integrated nested Laplace approximation (INLA). PMID:27468328

  6. State-space modeling of population sizes and trends in Nihoa Finch and Millerbird

    USGS Publications Warehouse

    Gorresen, P. Marcos; Brinck, Kevin W.; Camp, Richard J.; Farmer, Chris; Plentovich, Sheldon M.; Banko, Paul C.

    2016-01-01

    Both of the 2 passerines endemic to Nihoa Island, Hawai‘i, USA—the Nihoa Millerbird (Acrocephalus familiaris kingi) and Nihoa Finch (Telespiza ultima)—are listed as endangered by federal and state agencies. Their abundances have been estimated by irregularly implemented fixed-width strip-transect sampling from 1967 to 2012, from which area-based extrapolation of the raw counts produced highly variable abundance estimates for both species. To evaluate an alternative survey method and improve abundance estimates, we conducted variable-distance point-transect sampling between 2010 and 2014. We compared our results to those obtained from strip-transect samples. In addition, we applied state-space models to derive improved estimates of population size and trends from the legacy time series of strip-transect counts. Both species were fairly evenly distributed across Nihoa and occurred in all or nearly all available habitat. Population trends for Nihoa Millerbird were inconclusive because of high within-year variance. Trends for Nihoa Finch were positive, particularly since the early 1990s. Distance-based analysis of point-transect counts produced mean estimates of abundance similar to those from strip-transects but was generally more precise. However, both survey methods produced biologically unrealistic variability between years. State-space modeling of the long-term time series of abundances obtained from strip-transect counts effectively reduced uncertainty in both within- and between-year estimates of population size, and allowed short-term changes in abundance trajectories to be smoothed into a long-term trend.

  7. Comparing field- and model-based standing dead tree carbon stock estimates across forests of the US

    Treesearch

    Chistopher W. Woodall; Grant M. Domke; David W. MacFarlane; Christopher M. Oswalt

    2012-01-01

    As signatories to the United Nation Framework Convention on Climate Change, the US has been estimating standing dead tree (SDT) carbon (C) stocks using a model based on live tree attributes. The USDA Forest Service began sampling SDTs nationwide in 1999. With comprehensive field data now available, the objective of this study was to compare field- and model-based...

  8. Assessing statistical differences between parameters estimates in Partial Least Squares path modeling.

    PubMed

    Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten

    2018-01-01

    Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.

  9. Nest survival modelling using a multi-species approach in forests managed for timber and biofuel feedstock

    USGS Publications Warehouse

    Loman, Zachary G.; Monroe, Adrian; Riffell, Samuel K.; Miller, Darren A.; Vilella, Francisco; Wheat, Bradley R.; Rush, Scott A.; Martin, James A.

    2018-01-01

    Switchgrass (Panicum virgatum) intercropping is a novel forest management practice for biomass production intended to generate cellulosic feedstocks within intensively managed loblolly pine‐dominated landscapes. These pine plantations are important for early‐successional bird species, as short rotation times continually maintain early‐successional habitat. We tested the efficacy of using community models compared to individual surrogate species models in understanding influences on nest survival. We analysed nest data to test for differences in habitat use for 14 bird species in plots managed for switchgrass intercropping and controls within loblolly pine (Pinus taeda) plantations in Mississippi, USA.We adapted hierarchical models using hyper‐parameters to incorporate information from both common and rare species to understand community‐level nest survival. This approach incorporates rare species that are often discarded due to low sample sizes, but can inform community‐level demographic parameter estimates. We illustrate use of this approach in generating both species‐level and community‐wide estimates of daily survival rates for songbird nests. We were able to include rare species with low sample size (minimum n = 5) to inform a hyper‐prior, allowing us to estimate effects of covariates on daily survival at the community level, then compare this with a single‐species approach using surrogate species. Using single‐species models, we were unable to generate estimates below a sample size of 21 nests per species.Community model species‐level survival and parameter estimates were similar to those generated by five single‐species models, with improved precision in community model parameters.Covariates of nest placement indicated that switchgrass at the nest site (<4 m) reduced daily nest survival, although intercropping at the forest stand level increased daily nest survival.Synthesis and applications. Community models represent a viable method for estimating community nest survival rates and effects of covariates while incorporating limited data for rarely detected species. Intercropping switchgrass in loblolly pine plantations slightly increased daily nest survival at the research plot scale (0.1 km2), although at a local scale (50 m2) switchgrass negatively influenced nest survival. A likely explanation is intercropping shifted community composition, favouring species with greater disturbance tolerance.

  10. Non-convex Statistical Optimization for Sparse Tensor Graphical Model

    PubMed Central

    Sun, Wei; Wang, Zhaoran; Liu, Han; Cheng, Guang

    2016-01-01

    We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. The penalized maximum likelihood estimation of this model involves minimizing a non-convex objective function. In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence as well as consistent graph recovery. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. Our theoretical results are backed by thorough numerical studies. PMID:28316459

  11. Investigation of Properties of Nanocomposite Polyimide Samples Obtained by Fused Deposition Modeling

    NASA Astrophysics Data System (ADS)

    Polyakov, I. V.; Vaganov, G. V.; Yudin, V. E.; Ivan'kova, E. M.; Popova, E. N.; Elokhovskii, V. Yu.

    2018-03-01

    Nanomodified polyimide samples were obtained by fused deposition modeling (FDM) using an experimental setup for 3D printing of highly heat-resistant plastics. The mechanical properties and structure of these samples were studied by viscosimetry, differential scanning calorimetry, and scanning electron microscopy. A comparative estimation of the mechanical properties of laboratory samples obtained from a nanocomposite based on heat-resistant polyetherimide by FDM and injection molding is presented.

  12. Designing efficient nitrous oxide sampling strategies in agroecosystems using simulation models

    USDA-ARS?s Scientific Manuscript database

    Cumulative nitrous oxide (N2O) emissions calculated from discrete chamber-based flux measurements have unknown uncertainty. This study used an agroecosystems simulation model to design sampling strategies that yield accurate cumulative N2O flux estimates with a known uncertainty level. Daily soil N2...

  13. Data Combination and Instrumental Variables in Linear Models

    ERIC Educational Resources Information Center

    Khawand, Christopher

    2012-01-01

    Instrumental variables (IV) methods allow for consistent estimation of causal effects, but suffer from poor finite-sample properties and data availability constraints. IV estimates also tend to have relatively large standard errors, often inhibiting the interpretability of differences between IV and non-IV point estimates. Lastly, instrumental…

  14. Combining the boundary shift integral and tensor-based morphometry for brain atrophy estimation

    NASA Astrophysics Data System (ADS)

    Michalkiewicz, Mateusz; Pai, Akshay; Leung, Kelvin K.; Sommer, Stefan; Darkner, Sune; Sørensen, Lauge; Sporring, Jon; Nielsen, Mads

    2016-03-01

    Brain atrophy from structural magnetic resonance images (MRIs) is widely used as an imaging surrogate marker for Alzheimers disease. Their utility has been limited due to the large degree of variance and subsequently high sample size estimates. The only consistent and reasonably powerful atrophy estimation methods has been the boundary shift integral (BSI). In this paper, we first propose a tensor-based morphometry (TBM) method to measure voxel-wise atrophy that we combine with BSI. The combined model decreases the sample size estimates significantly when compared to BSI and TBM alone.

  15. A Bayesian kriging approach for blending satellite and ground precipitation observations

    USGS Publications Warehouse

    Verdin, Andrew P.; Rajagopalan, Balaji; Kleiber, William; Funk, Christopher C.

    2015-01-01

    Drought and flood management practices require accurate estimates of precipitation. Gauge observations, however, are often sparse in regions with complicated terrain, clustered in valleys, and of poor quality. Consequently, the spatial extent of wet events is poorly represented. Satellite-derived precipitation data are an attractive alternative, though they tend to underestimate the magnitude of wet events due to their dependency on retrieval algorithms and the indirect relationship between satellite infrared observations and precipitation intensities. Here we offer a Bayesian kriging approach for blending precipitation gauge data and the Climate Hazards Group Infrared Precipitation satellite-derived precipitation estimates for Central America, Colombia, and Venezuela. First, the gauge observations are modeled as a linear function of satellite-derived estimates and any number of other variables—for this research we include elevation. Prior distributions are defined for all model parameters and the posterior distributions are obtained simultaneously via Markov chain Monte Carlo sampling. The posterior distributions of these parameters are required for spatial estimation, and thus are obtained prior to implementing the spatial kriging model. This functional framework is applied to model parameters obtained by sampling from the posterior distributions, and the residuals of the linear model are subject to a spatial kriging model. Consequently, the posterior distributions and uncertainties of the blended precipitation estimates are obtained. We demonstrate this method by applying it to pentadal and monthly total precipitation fields during 2009. The model's performance and its inherent ability to capture wet events are investigated. We show that this blending method significantly improves upon the satellite-derived estimates and is also competitive in its ability to represent wet events. This procedure also provides a means to estimate a full conditional distribution of the “true” observed precipitation value at each grid cell.

  16. Prediction of functional aerobic capacity without exercise testing

    NASA Technical Reports Server (NTRS)

    Jackson, A. S.; Blair, S. N.; Mahar, M. T.; Wier, L. T.; Ross, R. M.; Stuteville, J. E.

    1990-01-01

    The purpose of this study was to develop functional aerobic capacity prediction models without using exercise tests (N-Ex) and to compare the accuracy with Astrand single-stage submaximal prediction methods. The data of 2,009 subjects (9.7% female) were randomly divided into validation (N = 1,543) and cross-validation (N = 466) samples. The validation sample was used to develop two N-Ex models to estimate VO2peak. Gender, age, body composition, and self-report activity were used to develop two N-Ex prediction models. One model estimated percent fat from skinfolds (N-Ex %fat) and the other used body mass index (N-Ex BMI) to represent body composition. The multiple correlations for the developed models were R = 0.81 (SE = 5.3 ml.kg-1.min-1) and R = 0.78 (SE = 5.6 ml.kg-1.min-1). This accuracy was confirmed when applied to the cross-validation sample. The N-Ex models were more accurate than what was obtained from VO2peak estimated from the Astrand prediction models. The SEs of the Astrand models ranged from 5.5-9.7 ml.kg-1.min-1. The N-Ex models were cross-validated on 59 men on hypertensive medication and 71 men who were found to have a positive exercise ECG. The SEs of the N-Ex models ranged from 4.6-5.4 ml.kg-1.min-1 with these subjects.(ABSTRACT TRUNCATED AT 250 WORDS).

  17. Research in the application of spectral data to crop identification and assessment, volume 2

    NASA Technical Reports Server (NTRS)

    Daughtry, C. S. T. (Principal Investigator); Hixson, M. M.; Bauer, M. E.

    1980-01-01

    The development of spectrometry crop development stage models is discussed with emphasis on models for corn and soybeans. One photothermal and four thermal meteorological models are evaluated. Spectral data were investigated as a source of information for crop yield models. Intercepted solar radiation and soil productivity are identified as factors related to yield which can be estimated from spectral data. Several techniques for machine classification of remotely sensed data for crop inventory were evaluated. Early season estimation, training procedures, the relationship of scene characteristics to classification performance, and full frame classification methods were studied. The optimal level for combining area and yield estimates of corn and soybeans is assessed utilizing current technology: digital analysis of LANDSAT MSS data on sample segments to provide area estimates and regression models to provide yield estimates.

  18. Malaria prevalence metrics in low- and middle-income countries: an assessment of precision in nationally-representative surveys.

    PubMed

    Alegana, Victor A; Wright, Jim; Bosco, Claudio; Okiro, Emelda A; Atkinson, Peter M; Snow, Robert W; Tatem, Andrew J; Noor, Abdisalan M

    2017-11-21

    One pillar to monitoring progress towards the Sustainable Development Goals is the investment in high quality data to strengthen the scientific basis for decision-making. At present, nationally-representative surveys are the main source of data for establishing a scientific evidence base, monitoring, and evaluation of health metrics. However, little is known about the optimal precisions of various population-level health and development indicators that remains unquantified in nationally-representative household surveys. Here, a retrospective analysis of the precision of prevalence from these surveys was conducted. Using malaria indicators, data were assembled in nine sub-Saharan African countries with at least two nationally-representative surveys. A Bayesian statistical model was used to estimate between- and within-cluster variability for fever and malaria prevalence, and insecticide-treated bed nets (ITNs) use in children under the age of 5 years. The intra-class correlation coefficient was estimated along with the optimal sample size for each indicator with associated uncertainty. Results suggest that the estimated sample sizes for the current nationally-representative surveys increases with declining malaria prevalence. Comparison between the actual sample size and the modelled estimate showed a requirement to increase the sample size for parasite prevalence by up to 77.7% (95% Bayesian credible intervals 74.7-79.4) for the 2015 Kenya MIS (estimated sample size of children 0-4 years 7218 [7099-7288]), and 54.1% [50.1-56.5] for the 2014-2015 Rwanda DHS (12,220 [11,950-12,410]). This study highlights the importance of defining indicator-relevant sample sizes to achieve the required precision in the current national surveys. While expanding the current surveys would need additional investment, the study highlights the need for improved approaches to cost effective sampling.

  19. A Simple Analytic Model for Estimating Mars Ascent Vehicle Mass and Performance

    NASA Technical Reports Server (NTRS)

    Woolley, Ryan C.

    2014-01-01

    The Mars Ascent Vehicle (MAV) is a crucial component in any sample return campaign. In this paper we present a universal model for a two-stage MAV along with the analytic equations and simple parametric relationships necessary to quickly estimate MAV mass and performance. Ascent trajectories can be modeled as two-burn transfers from the surface with appropriate loss estimations for finite burns, steering, and drag. Minimizing lift-off mass is achieved by balancing optimized staging and an optimized path-to-orbit. This model allows designers to quickly find optimized solutions and to see the effects of design choices.

  20. Estimating cross-validatory predictive p-values with integrated importance sampling for disease mapping models.

    PubMed

    Li, Longhai; Feng, Cindy X; Qiu, Shi

    2017-06-30

    An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

Top