Estimating Function Approaches for Spatial Point Processes
NASA Astrophysics Data System (ADS)
Deng, Chong
Spatial point pattern data consist of locations of events that are often of interest in biological and ecological studies. Such data are commonly viewed as a realization from a stochastic process called spatial point process. To fit a parametric spatial point process model to such data, likelihood-based methods have been widely studied. However, while maximum likelihood estimation is often too computationally intensive for Cox and cluster processes, pairwise likelihood methods such as composite likelihood, Palm likelihood usually suffer from the loss of information due to the ignorance of correlation among pairs. For many types of correlated data other than spatial point processes, when likelihood-based approaches are not desirable, estimating functions have been widely used for model fitting. In this dissertation, we explore the estimating function approaches for fitting spatial point process models. These approaches, which are based on the asymptotic optimal estimating function theories, can be used to incorporate the correlation among data and yield more efficient estimators. We conducted a series of studies to demonstrate that these estmating function approaches are good alternatives to balance the trade-off between computation complexity and estimating efficiency. First, we propose a new estimating procedure that improves the efficiency of pairwise composite likelihood method in estimating clustering parameters. Our approach combines estimating functions derived from pairwise composite likeli-hood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate its efficacy through a simulation study and an application to the longleaf pine data. Second, we further explore the quasi-likelihood approach on fitting second-order intensity function of spatial point processes. However, the original second-order quasi-likelihood is barely feasible due to the intense computation and high memory requirement needed to solve a large linear system. Motivated by the existence of geometric regular patterns in the stationary point processes, we find a lower dimension representation of the optimal weight function and propose a reduced second-order quasi-likelihood approach. Through a simulation study, we show that the proposed method not only demonstrates superior performance in fitting the clustering parameter but also merits in the relaxation of the constraint of the tuning parameter, H. Third, we studied the quasi-likelihood type estimating funciton that is optimal in a certain class of first-order estimating functions for estimating the regression parameter in spatial point process models. Then, by using a novel spectral representation, we construct an implementation that is computationally much more efficient and can be applied to more general setup than the original quasi-likelihood method.
Jeon, Jihyoun; Hsu, Li; Gorfine, Malka
2012-07-01
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.
Estimating the variance for heterogeneity in arm-based network meta-analysis.
Piepho, Hans-Peter; Madden, Laurence V; Roger, James; Payne, Roger; Williams, Emlyn R
2018-04-19
Network meta-analysis can be implemented by using arm-based or contrast-based models. Here we focus on arm-based models and fit them using generalized linear mixed model procedures. Full maximum likelihood (ML) estimation leads to biased trial-by-treatment interaction variance estimates for heterogeneity. Thus, our objective is to investigate alternative approaches to variance estimation that reduce bias compared with full ML. Specifically, we use penalized quasi-likelihood/pseudo-likelihood and hierarchical (h) likelihood approaches. In addition, we consider a novel model modification that yields estimators akin to the residual maximum likelihood estimator for linear mixed models. The proposed methods are compared by simulation, and 2 real datasets are used for illustration. Simulations show that penalized quasi-likelihood/pseudo-likelihood and h-likelihood reduce bias and yield satisfactory coverage rates. Sum-to-zero restriction and baseline contrasts for random trial-by-treatment interaction effects, as well as a residual ML-like adjustment, also reduce bias compared with an unconstrained model when ML is used, but coverage rates are not quite as good. Penalized quasi-likelihood/pseudo-likelihood and h-likelihood are therefore recommended. Copyright © 2018 John Wiley & Sons, Ltd.
Changren Weng; Thomas L. Kubisiak; C. Dana Nelson; James P. Geaghan; Michael Stine
1999-01-01
Single marker regression and single marker maximum likelihood estimation were tied to detect quantitative trait loci (QTLs) controlling the early height growth of longleaf pine and slash pine using a ((longleaf pine x slash pine) x slash pine) BC, population consisting of 83 progeny. Maximum likelihood estimation was found to be more power than regression and could...
Chen, Rui; Hyrien, Ollivier
2011-01-01
This article deals with quasi- and pseudo-likelihood estimation in a class of continuous-time multi-type Markov branching processes observed at discrete points in time. “Conventional” and conditional estimation are discussed for both approaches. We compare their properties and identify situations where they lead to asymptotically equivalent estimators. Both approaches possess robustness properties, and coincide with maximum likelihood estimation in some cases. Quasi-likelihood functions involving only linear combinations of the data may be unable to estimate all model parameters. Remedial measures exist, including the resort either to non-linear functions of the data or to conditioning the moments on appropriate sigma-algebras. The method of pseudo-likelihood may also resolve this issue. We investigate the properties of these approaches in three examples: the pure birth process, the linear birth-and-death process, and a two-type process that generalizes the previous two examples. Simulations studies are conducted to evaluate performance in finite samples. PMID:21552356
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
NASA Astrophysics Data System (ADS)
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
Profile-Likelihood Approach for Estimating Generalized Linear Mixed Models with Factor Structures
ERIC Educational Resources Information Center
Jeon, Minjeong; Rabe-Hesketh, Sophia
2012-01-01
In this article, the authors suggest a profile-likelihood approach for estimating complex models by maximum likelihood (ML) using standard software and minimal programming. The method works whenever setting some of the parameters of the model to known constants turns the model into a standard model. An important class of models that can be…
NASA Astrophysics Data System (ADS)
Sutawanir
2015-12-01
Mortality tables play important role in actuarial studies such as life annuities, premium determination, premium reserve, valuation pension plan, pension funding. Some known mortality tables are CSO mortality table, Indonesian Mortality Table, Bowers mortality table, Japan Mortality table. For actuary applications some tables are constructed with different environment such as single decrement, double decrement, and multiple decrement. There exist two approaches in mortality table construction : mathematics approach and statistical approach. Distribution model and estimation theory are the statistical concepts that are used in mortality table construction. This article aims to discuss the statistical approach in mortality table construction. The distributional assumptions are uniform death distribution (UDD) and constant force (exponential). Moment estimation and maximum likelihood are used to estimate the mortality parameter. Moment estimation methods are easier to manipulate compared to maximum likelihood estimation (mle). However, the complete mortality data are not used in moment estimation method. Maximum likelihood exploited all available information in mortality estimation. Some mle equations are complicated and solved using numerical methods. The article focus on single decrement estimation using moment and maximum likelihood estimation. Some extension to double decrement will introduced. Simple dataset will be used to illustrated the mortality estimation, and mortality table.
Finite mixture model: A maximum likelihood estimation approach on time series data
NASA Astrophysics Data System (ADS)
Yen, Phoong Seuk; Ismail, Mohd Tahir; Hamzah, Firdaus Mohamad
2014-09-01
Recently, statistician emphasized on the fitting of finite mixture model by using maximum likelihood estimation as it provides asymptotic properties. In addition, it shows consistency properties as the sample sizes increases to infinity. This illustrated that maximum likelihood estimation is an unbiased estimator. Moreover, the estimate parameters obtained from the application of maximum likelihood estimation have smallest variance as compared to others statistical method as the sample sizes increases. Thus, maximum likelihood estimation is adopted in this paper to fit the two-component mixture model in order to explore the relationship between rubber price and exchange rate for Malaysia, Thailand, Philippines and Indonesia. Results described that there is a negative effect among rubber price and exchange rate for all selected countries.
Profile-likelihood Confidence Intervals in Item Response Theory Models.
Chalmers, R Philip; Pek, Jolynn; Liu, Yang
2017-01-01
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
Five Methods for Estimating Angoff Cut Scores with IRT
ERIC Educational Resources Information Center
Wyse, Adam E.
2017-01-01
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…
Falk, Carl F; Cai, Li
2016-06-01
We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.
NASA Technical Reports Server (NTRS)
Walker, H. F.
1976-01-01
Likelihood equations determined by the two types of samples which are necessary conditions for a maximum-likelihood estimate are considered. These equations, suggest certain successive-approximations iterative procedures for obtaining maximum-likelihood estimates. These are generalized steepest ascent (deflected gradient) procedures. It is shown that, with probability 1 as N sub 0 approaches infinity (regardless of the relative sizes of N sub 0 and N sub 1, i=1,...,m), these procedures converge locally to the strongly consistent maximum-likelihood estimates whenever the step size is between 0 and 2. Furthermore, the value of the step size which yields optimal local convergence rates is bounded from below by a number which always lies between 1 and 2.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
Quantum state estimation when qubits are lost: a no-data-left-behind approach
Williams, Brian P.; Lougovski, Pavel
2017-04-06
We present an approach to Bayesian mean estimation of quantum states using hyperspherical parametrization and an experiment-specific likelihood which allows utilization of all available data, even when qubits are lost. With this method, we report the first closed-form Bayesian mean and maximum likelihood estimates for the ideal single qubit. Due to computational constraints, we utilize numerical sampling to determine the Bayesian mean estimate for a photonic two-qubit experiment in which our novel analysis reduces burdens associated with experimental asymmetries and inefficiencies. This method can be applied to quantum states of any dimension and experimental complexity.
SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.
Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman
2017-03-01
We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood e...
Baele, Guy; Lemey, Philippe; Vansteelandt, Stijn
2013-03-06
Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
2013-01-01
Background Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results We here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation. PMID:23497171
Objectively combining AR5 instrumental period and paleoclimate climate sensitivity evidence
NASA Astrophysics Data System (ADS)
Lewis, Nicholas; Grünwald, Peter
2018-03-01
Combining instrumental period evidence regarding equilibrium climate sensitivity with largely independent paleoclimate proxy evidence should enable a more constrained sensitivity estimate to be obtained. Previous, subjective Bayesian approaches involved selection of a prior probability distribution reflecting the investigators' beliefs about climate sensitivity. Here a recently developed approach employing two different statistical methods—objective Bayesian and frequentist likelihood-ratio—is used to combine instrumental period and paleoclimate evidence based on data presented and assessments made in the IPCC Fifth Assessment Report. Probabilistic estimates from each source of evidence are represented by posterior probability density functions (PDFs) of physically-appropriate form that can be uniquely factored into a likelihood function and a noninformative prior distribution. The three-parameter form is shown accurately to fit a wide range of estimated climate sensitivity PDFs. The likelihood functions relating to the probabilistic estimates from the two sources are multiplicatively combined and a prior is derived that is noninformative for inference from the combined evidence. A posterior PDF that incorporates the evidence from both sources is produced using a single-step approach, which avoids the order-dependency that would arise if Bayesian updating were used. Results are compared with an alternative approach using the frequentist signed root likelihood ratio method. Results from these two methods are effectively identical, and provide a 5-95% range for climate sensitivity of 1.1-4.05 K (median 1.87 K).
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Hock, Sabrina; Hasenauer, Jan; Theis, Fabian J
2013-01-01
Diffusion is a key component of many biological processes such as chemotaxis, developmental differentiation and tissue morphogenesis. Since recently, the spatial gradients caused by diffusion can be assessed in-vitro and in-vivo using microscopy based imaging techniques. The resulting time-series of two dimensional, high-resolutions images in combination with mechanistic models enable the quantitative analysis of the underlying mechanisms. However, such a model-based analysis is still challenging due to measurement noise and sparse observations, which result in uncertainties of the model parameters. We introduce a likelihood function for image-based measurements with log-normal distributed noise. Based upon this likelihood function we formulate the maximum likelihood estimation problem, which is solved using PDE-constrained optimization methods. To assess the uncertainty and practical identifiability of the parameters we introduce profile likelihoods for diffusion processes. As proof of concept, we model certain aspects of the guidance of dendritic cells towards lymphatic vessels, an example for haptotaxis. Using a realistic set of artificial measurement data, we estimate the five kinetic parameters of this model and compute profile likelihoods. Our novel approach for the estimation of model parameters from image data as well as the proposed identifiability analysis approach is widely applicable to diffusion processes. The profile likelihood based method provides more rigorous uncertainty bounds in contrast to local approximation methods.
A Bayesian Approach to More Stable Estimates of Group-Level Effects in Contextual Studies.
Zitzmann, Steffen; Lüdtke, Oliver; Robitzsch, Alexander
2015-01-01
Multilevel analyses are often used to estimate the effects of group-level constructs. However, when using aggregated individual data (e.g., student ratings) to assess a group-level construct (e.g., classroom climate), the observed group mean might not provide a reliable measure of the unobserved latent group mean. In the present article, we propose a Bayesian approach that can be used to estimate a multilevel latent covariate model, which corrects for the unreliable assessment of the latent group mean when estimating the group-level effect. A simulation study was conducted to evaluate the choice of different priors for the group-level variance of the predictor variable and to compare the Bayesian approach with the maximum likelihood approach implemented in the software Mplus. Results showed that, under problematic conditions (i.e., small number of groups, predictor variable with a small ICC), the Bayesian approach produced more accurate estimates of the group-level effect than the maximum likelihood approach did.
Fang, Yun; Wu, Hulin; Zhu, Li-Xing
2011-07-01
We propose a two-stage estimation method for random coefficient ordinary differential equation (ODE) models. A maximum pseudo-likelihood estimator (MPLE) is derived based on a mixed-effects modeling approach and its asymptotic properties for population parameters are established. The proposed method does not require repeatedly solving ODEs, and is computationally efficient although it does pay a price with the loss of some estimation efficiency. However, the method does offer an alternative approach when the exact likelihood approach fails due to model complexity and high-dimensional parameter space, and it can also serve as a method to obtain the starting estimates for more accurate estimation methods. In addition, the proposed method does not need to specify the initial values of state variables and preserves all the advantages of the mixed-effects modeling approach. The finite sample properties of the proposed estimator are studied via Monte Carlo simulations and the methodology is also illustrated with application to an AIDS clinical data set.
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty
Baele, Guy; Lemey, Philippe; Suchard, Marc A.
2016-01-01
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of “working distributions” to facilitate—or shorten—the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a “working” distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different “working” distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses. PMID:26526428
Dahabreh, Issa J; Trikalinos, Thomas A; Lau, Joseph; Schmid, Christopher H
2017-03-01
To compare statistical methods for meta-analysis of sensitivity and specificity of medical tests (e.g., diagnostic or screening tests). We constructed a database of PubMed-indexed meta-analyses of test performance from which 2 × 2 tables for each included study could be extracted. We reanalyzed the data using univariate and bivariate random effects models fit with inverse variance and maximum likelihood methods. Analyses were performed using both normal and binomial likelihoods to describe within-study variability. The bivariate model using the binomial likelihood was also fit using a fully Bayesian approach. We use two worked examples-thoracic computerized tomography to detect aortic injury and rapid prescreening of Papanicolaou smears to detect cytological abnormalities-to highlight that different meta-analysis approaches can produce different results. We also present results from reanalysis of 308 meta-analyses of sensitivity and specificity. Models using the normal approximation produced sensitivity and specificity estimates closer to 50% and smaller standard errors compared to models using the binomial likelihood; absolute differences of 5% or greater were observed in 12% and 5% of meta-analyses for sensitivity and specificity, respectively. Results from univariate and bivariate random effects models were similar, regardless of estimation method. Maximum likelihood and Bayesian methods produced almost identical summary estimates under the bivariate model; however, Bayesian analyses indicated greater uncertainty around those estimates. Bivariate models produced imprecise estimates of the between-study correlation of sensitivity and specificity. Differences between methods were larger with increasing proportion of studies that were small or required a continuity correction. The binomial likelihood should be used to model within-study variability. Univariate and bivariate models give similar estimates of the marginal distributions for sensitivity and specificity. Bayesian methods fully quantify uncertainty and their ability to incorporate external evidence may be useful for imprecisely estimated parameters. Copyright © 2017 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Kieftenbeld, Vincent; Natesan, Prathiba
2012-01-01
Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…
Julien, Clavel; Leandro, Aristide; Hélène, Morlon
2018-06-19
Working with high-dimensional phylogenetic comparative datasets is challenging because likelihood-based multivariate methods suffer from low statistical performances as the number of traits p approaches the number of species n and because some computational complications occur when p exceeds n. Alternative phylogenetic comparative methods have recently been proposed to deal with the large p small n scenario but their use and performances are limited. Here we develop a penalized likelihood framework to deal with high-dimensional comparative datasets. We propose various penalizations and methods for selecting the intensity of the penalties. We apply this general framework to the estimation of parameters (the evolutionary trait covariance matrix and parameters of the evolutionary model) and model comparison for the high-dimensional multivariate Brownian (BM), Early-burst (EB), Ornstein-Uhlenbeck (OU) and Pagel's lambda models. We show using simulations that our penalized likelihood approach dramatically improves the estimation of evolutionary trait covariance matrices and model parameters when p approaches n, and allows for their accurate estimation when p equals or exceeds n. In addition, we show that penalized likelihood models can be efficiently compared using Generalized Information Criterion (GIC). We implement these methods, as well as the related estimation of ancestral states and the computation of phylogenetic PCA in the R package RPANDA and mvMORPH. Finally, we illustrate the utility of the new proposed framework by evaluating evolutionary models fit, analyzing integration patterns, and reconstructing evolutionary trajectories for a high-dimensional 3-D dataset of brain shape in the New World monkeys. We find a clear support for an Early-burst model suggesting an early diversification of brain morphology during the ecological radiation of the clade. Penalized likelihood offers an efficient way to deal with high-dimensional multivariate comparative data.
Exponential series approaches for nonparametric graphical models
NASA Astrophysics Data System (ADS)
Janofsky, Eric
Markov Random Fields (MRFs) or undirected graphical models are parsimonious representations of joint probability distributions. This thesis studies high-dimensional, continuous-valued pairwise Markov Random Fields. We are particularly interested in approximating pairwise densities whose logarithm belongs to a Sobolev space. For this problem we propose the method of exponential series which approximates the log density by a finite-dimensional exponential family with the number of sufficient statistics increasing with the sample size. We consider two approaches to estimating these models. The first is regularized maximum likelihood. This involves optimizing the sum of the log-likelihood of the data and a sparsity-inducing regularizer. We then propose a variational approximation to the likelihood based on tree-reweighted, nonparametric message passing. This approximation allows for upper bounds on risk estimates, leverages parallelization and is scalable to densities on hundreds of nodes. We show how the regularized variational MLE may be estimated using a proximal gradient algorithm. We then consider estimation using regularized score matching. This approach uses an alternative scoring rule to the log-likelihood, which obviates the need to compute the normalizing constant of the distribution. For general continuous-valued exponential families, we provide parameter and edge consistency results. As a special case we detail a new approach to sparse precision matrix estimation which has statistical performance competitive with the graphical lasso and computational performance competitive with the state-of-the-art glasso algorithm. We then describe results for model selection in the nonparametric pairwise model using exponential series. The regularized score matching problem is shown to be a convex program; we provide scalable algorithms based on consensus alternating direction method of multipliers (ADMM) and coordinate-wise descent. We use simulations to compare our method to others in the literature as well as the aforementioned TRW estimator.
An Empirical Comparison of Heterogeneity Variance Estimators in 12,894 Meta-Analyses
ERIC Educational Resources Information Center
Langan, Dean; Higgins, Julian P. T.; Simmonds, Mark
2015-01-01
Heterogeneity in meta-analysis is most commonly estimated using a moment-based approach described by DerSimonian and Laird. However, this method has been shown to produce biased estimates. Alternative methods to estimate heterogeneity include the restricted maximum likelihood approach and those proposed by Paule and Mandel, Sidik and Jonkman, and…
Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.; ...
2014-10-16
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genesmore » and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.« less
Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.; Chia, Nicholas; Price, Nathan D.
2014-01-01
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface. PMID:25329157
Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level
Savalei, Victoria; Rhemtulla, Mijke
2017-01-01
In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data—that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study. PMID:29276371
Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level.
Savalei, Victoria; Rhemtulla, Mijke
2017-08-01
In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data-that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study.
NASA Technical Reports Server (NTRS)
Klein, V.
1980-01-01
A frequency domain maximum likelihood method is developed for the estimation of airplane stability and control parameters from measured data. The model of an airplane is represented by a discrete-type steady state Kalman filter with time variables replaced by their Fourier series expansions. The likelihood function of innovations is formulated, and by its maximization with respect to unknown parameters the estimation algorithm is obtained. This algorithm is then simplified to the output error estimation method with the data in the form of transformed time histories, frequency response curves, or spectral and cross-spectral densities. The development is followed by a discussion on the equivalence of the cost function in the time and frequency domains, and on advantages and disadvantages of the frequency domain approach. The algorithm developed is applied in four examples to the estimation of longitudinal parameters of a general aviation airplane using computer generated and measured data in turbulent and still air. The cost functions in the time and frequency domains are shown to be equivalent; therefore, both approaches are complementary and not contradictory. Despite some computational advantages of parameter estimation in the frequency domain, this approach is limited to linear equations of motion with constant coefficients.
Maximum Likelihood Estimations and EM Algorithms with Length-biased Data
Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu
2012-01-01
SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840
Approximated maximum likelihood estimation in multifractal random walks
NASA Astrophysics Data System (ADS)
Løvsletten, O.; Rypdal, M.
2012-04-01
We present an approximated maximum likelihood method for the multifractal random walk processes of [E. Bacry , Phys. Rev. EPLEEE81539-375510.1103/PhysRevE.64.026103 64, 026103 (2001)]. The likelihood is computed using a Laplace approximation and a truncation in the dependency structure for the latent volatility. The procedure is implemented as a package in the r computer language. Its performance is tested on synthetic data and compared to an inference approach based on the generalized method of moments. The method is applied to estimate parameters for various financial stock indices.
Robust analysis of semiparametric renewal process models
Lin, Feng-Chang; Truong, Young K.; Fine, Jason P.
2013-01-01
Summary A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. PMID:24550568
Nagelkerke, Nico; Fidler, Vaclav
2015-01-01
The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.
ERIC Educational Resources Information Center
Lee, Soo; Suh, Youngsuk
2018-01-01
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
Maximum likelihood estimation for Cox's regression model under nested case-control sampling.
Scheike, Thomas H; Juul, Anders
2004-04-01
Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates.
A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables
ERIC Educational Resources Information Center
Savalei, Victoria; Bentler, Peter M.
2009-01-01
A well-known ad-hoc approach to conducting structural equation modeling with missing data is to obtain a saturated maximum likelihood (ML) estimate of the population covariance matrix and then to use this estimate in the complete data ML fitting function to obtain parameter estimates. This 2-stage (TS) approach is appealing because it minimizes a…
ATAC Autocuer Modeling Analysis.
1981-01-01
the analysis of the simple rectangular scrnentation (1) is based on detection and estimation theory (2). This approach uses the concept of maximum ...continuous wave forms. In order to develop the principles of maximum likelihood, it is con- venient to develop the principles for the "classical...the concept of maximum likelihood is significant in that it provides the optimum performance of the detection/estimation problem. With a knowledge of
NASA Astrophysics Data System (ADS)
Lusiana, Evellin Dewi
2017-12-01
The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.
NASA Astrophysics Data System (ADS)
Pan, Zhen; Anderes, Ethan; Knox, Lloyd
2018-05-01
One of the major targets for next-generation cosmic microwave background (CMB) experiments is the detection of the primordial B-mode signal. Planning is under way for Stage-IV experiments that are projected to have instrumental noise small enough to make lensing and foregrounds the dominant source of uncertainty for estimating the tensor-to-scalar ratio r from polarization maps. This makes delensing a crucial part of future CMB polarization science. In this paper we present a likelihood method for estimating the tensor-to-scalar ratio r from CMB polarization observations, which combines the benefits of a full-scale likelihood approach with the tractability of the quadratic delensing technique. This method is a pixel space, all order likelihood analysis of the quadratic delensed B modes, and it essentially builds upon the quadratic delenser by taking into account all order lensing and pixel space anomalies. Its tractability relies on a crucial factorization of the pixel space covariance matrix of the polarization observations which allows one to compute the full Gaussian approximate likelihood profile, as a function of r , at the same computational cost of a single likelihood evaluation.
Heersink, Daniel K; Caley, Peter; Paini, Dean R; Barry, Simon C
2016-05-01
The cost of an uncontrolled incursion of invasive alien species (IAS) arising from undetected entry through ports can be substantial, and knowledge of port-specific risks is needed to help allocate limited surveillance resources. Quantifying the establishment likelihood of such an incursion requires quantifying the ability of a species to enter, establish, and spread. Estimation of the approach rate of IAS into ports provides a measure of likelihood of entry. Data on the approach rate of IAS are typically sparse, and the combinations of risk factors relating to country of origin and port of arrival diverse. This presents challenges to making formal statistical inference on establishment likelihood. Here we demonstrate how these challenges can be overcome with judicious use of mixed-effects models when estimating the incursion likelihood into Australia of the European (Apis mellifera) and Asian (A. cerana) honeybees, along with the invasive parasites of biosecurity concern they host (e.g., Varroa destructor). Our results demonstrate how skewed the establishment likelihood is, with one-tenth of the ports accounting for 80% or more of the likelihood for both species. These results have been utilized by biosecurity agencies in the allocation of resources to the surveillance of maritime ports. © 2015 Society for Risk Analysis.
Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling
NASA Astrophysics Data System (ADS)
Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.
2017-04-01
Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.
Empirical likelihood inference in randomized clinical trials.
Zhang, Biao
2017-01-01
In individually randomized controlled trials, in addition to the primary outcome, information is often available on a number of covariates prior to randomization. This information is frequently utilized to undertake adjustment for baseline characteristics in order to increase precision of the estimation of average treatment effects; such adjustment is usually performed via covariate adjustment in outcome regression models. Although the use of covariate adjustment is widely seen as desirable for making treatment effect estimates more precise and the corresponding hypothesis tests more powerful, there are considerable concerns that objective inference in randomized clinical trials can potentially be compromised. In this paper, we study an empirical likelihood approach to covariate adjustment and propose two unbiased estimating functions that automatically decouple evaluation of average treatment effects from regression modeling of covariate-outcome relationships. The resulting empirical likelihood estimator of the average treatment effect is as efficient as the existing efficient adjusted estimators 1 when separate treatment-specific working regression models are correctly specified, yet are at least as efficient as the existing efficient adjusted estimators 1 for any given treatment-specific working regression models whether or not they coincide with the true treatment-specific covariate-outcome relationships. We present a simulation study to compare the finite sample performance of various methods along with some results on analysis of a data set from an HIV clinical trial. The simulation results indicate that the proposed empirical likelihood approach is more efficient and powerful than its competitors when the working covariate-outcome relationships by treatment status are misspecified.
Technical Note: Approximate Bayesian parameterization of a complex tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2013-08-01
Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
Basu, Anirban; Manca, Andrea
2012-01-01
To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.
Maximum likelihood phase-retrieval algorithm: applications.
Nahrstedt, D A; Southwell, W H
1984-12-01
The maximum likelihood estimator approach is shown to be effective in determining the wave front aberration in systems involving laser and flow field diagnostics and optical testing. The robustness of the algorithm enables convergence even in cases of severe wave front error and real, nonsymmetrical, obscured amplitude distributions.
An Improved Nested Sampling Algorithm for Model Selection and Assessment
NASA Astrophysics Data System (ADS)
Zeng, X.; Ye, M.; Wu, J.; WANG, D.
2017-12-01
Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.
NASA Astrophysics Data System (ADS)
Morse, Brad S.; Pohll, Greg; Huntington, Justin; Rodriguez Castillo, Ramiro
2003-06-01
In 1992, Mexican researchers discovered concentrations of arsenic in excess of World Heath Organization (WHO) standards in several municipal wells in the Zimapan Valley of Mexico. This study describes a method to delineate a capture zone for one of the most highly contaminated wells to aid in future well siting. A stochastic approach was used to model the capture zone because of the high level of uncertainty in several input parameters. Two stochastic techniques were performed and compared: "standard" Monte Carlo analysis and the generalized likelihood uncertainty estimator (GLUE) methodology. The GLUE procedure differs from standard Monte Carlo analysis in that it incorporates a goodness of fit (termed a likelihood measure) in evaluating the model. This allows for more information (in this case, head data) to be used in the uncertainty analysis, resulting in smaller prediction uncertainty. Two likelihood measures are tested in this study to determine which are in better agreement with the observed heads. While the standard Monte Carlo approach does not aid in parameter estimation, the GLUE methodology indicates best fit models when hydraulic conductivity is approximately 10-6.5 m/s, with vertically isotropic conditions and large quantities of interbasin flow entering the basin. Probabilistic isochrones (capture zone boundaries) are then presented, and as predicted, the GLUE-derived capture zones are significantly smaller in area than those from the standard Monte Carlo approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pražnikar, Jure; University of Primorska,; Turk, Dušan, E-mail: dusan.turk@ijs.si
2014-12-01
The maximum-likelihood free-kick target, which calculates model error estimates from the work set and a randomly displaced model, proved superior in the accuracy and consistency of refinement of crystal structures compared with the maximum-likelihood cross-validation target, which calculates error estimates from the test set and the unperturbed model. The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. Theymore » utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of R{sub free} or may leave it out completely.« less
A Composite Likelihood Inference in Latent Variable Models for Ordinal Longitudinal Responses
ERIC Educational Resources Information Center
Vasdekis, Vassilis G. S.; Cagnone, Silvia; Moustaki, Irini
2012-01-01
The paper proposes a composite likelihood estimation approach that uses bivariate instead of multivariate marginal probabilities for ordinal longitudinal responses using a latent variable model. The model considers time-dependent latent variables and item-specific random effects to be accountable for the interdependencies of the multivariate…
Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.
Xie, Yanmei; Zhang, Biao
2017-04-20
Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
NASA Astrophysics Data System (ADS)
Fenicia, Fabrizio; Reichert, Peter; Kavetski, Dmitri; Albert, Calro
2016-04-01
The calibration of hydrological models based on signatures (e.g. Flow Duration Curves - FDCs) is often advocated as an alternative to model calibration based on the full time series of system responses (e.g. hydrographs). Signature based calibration is motivated by various arguments. From a conceptual perspective, calibration on signatures is a way to filter out errors that are difficult to represent when calibrating on the full time series. Such errors may for example occur when observed and simulated hydrographs are shifted, either on the "time" axis (i.e. left or right), or on the "streamflow" axis (i.e. above or below). These shifts may be due to errors in the precipitation input (time or amount), and if not properly accounted in the likelihood function, may cause biased parameter estimates (e.g. estimated model parameters that do not reproduce the recession characteristics of a hydrograph). From a practical perspective, signature based calibration is seen as a possible solution for making predictions in ungauged basins. Where streamflow data are not available, it may in fact be possible to reliably estimate streamflow signatures. Previous research has for example shown how FDCs can be reliably estimated at ungauged locations based on climatic and physiographic influence factors. Typically, the goal of signature based calibration is not the prediction of the signatures themselves, but the prediction of the system responses. Ideally, the prediction of system responses should be accompanied by a reliable quantification of the associated uncertainties. Previous approaches for signature based calibration, however, do not allow reliable estimates of streamflow predictive distributions. Here, we illustrate how the Bayesian approach can be employed to obtain reliable streamflow predictive distributions based on signatures. A case study is presented, where a hydrological model is calibrated on FDCs and additional signatures. We propose an approach where the likelihood function for the signatures is derived from the likelihood for streamflow (rather than using an "ad-hoc" likelihood for the signatures as done in previous approaches). This likelihood is not easily tractable analytically and we therefore cannot apply "simple" MCMC methods. This numerical problem is solved using Approximate Bayesian Computation (ABC). Our result indicate that the proposed approach is suitable for producing reliable streamflow predictive distributions based on calibration to signature data. Moreover, our results provide indications on which signatures are more appropriate to represent the information content of the hydrograph.
Maintained Individual Data Distributed Likelihood Estimation (MIDDLE)
Boker, Steven M.; Brick, Timothy R.; Pritikin, Joshua N.; Wang, Yang; von Oertzen, Timo; Brown, Donald; Lach, John; Estabrook, Ryne; Hunter, Michael D.; Maes, Hermine H.; Neale, Michael C.
2015-01-01
Maintained Individual Data Distributed Likelihood Estimation (MIDDLE) is a novel paradigm for research in the behavioral, social, and health sciences. The MIDDLE approach is based on the seemingly-impossible idea that data can be privately maintained by participants and never revealed to researchers, while still enabling statistical models to be fit and scientific hypotheses tested. MIDDLE rests on the assumption that participant data should belong to, be controlled by, and remain in the possession of the participants themselves. Distributed likelihood estimation refers to fitting statistical models by sending an objective function and vector of parameters to each participants’ personal device (e.g., smartphone, tablet, computer), where the likelihood of that individual’s data is calculated locally. Only the likelihood value is returned to the central optimizer. The optimizer aggregates likelihood values from responding participants and chooses new vectors of parameters until the model converges. A MIDDLE study provides significantly greater privacy for participants, automatic management of opt-in and opt-out consent, lower cost for the researcher and funding institute, and faster determination of results. Furthermore, if a participant opts into several studies simultaneously and opts into data sharing, these studies automatically have access to individual-level longitudinal data linked across all studies. PMID:26717128
Modeling gene expression measurement error: a quasi-likelihood approach
Strimmer, Korbinian
2003-01-01
Background Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale). Results Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects. Conclusions The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression. PMID:12659637
NASA Astrophysics Data System (ADS)
Dang, H.; Wang, A. S.; Sussman, Marc S.; Siewerdsen, J. H.; Stayman, J. W.
2014-09-01
Sequential imaging studies are conducted in many clinical scenarios. Prior images from previous studies contain a great deal of patient-specific anatomical information and can be used in conjunction with subsequent imaging acquisitions to maintain image quality while enabling radiation dose reduction (e.g., through sparse angular sampling, reduction in fluence, etc). However, patient motion between images in such sequences results in misregistration between the prior image and current anatomy. Existing prior-image-based approaches often include only a simple rigid registration step that can be insufficient for capturing complex anatomical motion, introducing detrimental effects in subsequent image reconstruction. In this work, we propose a joint framework that estimates the 3D deformation between an unregistered prior image and the current anatomy (based on a subsequent data acquisition) and reconstructs the current anatomical image using a model-based reconstruction approach that includes regularization based on the deformed prior image. This framework is referred to as deformable prior image registration, penalized-likelihood estimation (dPIRPLE). Central to this framework is the inclusion of a 3D B-spline-based free-form-deformation model into the joint registration-reconstruction objective function. The proposed framework is solved using a maximization strategy whereby alternating updates to the registration parameters and image estimates are applied allowing for improvements in both the registration and reconstruction throughout the optimization process. Cadaver experiments were conducted on a cone-beam CT testbench emulating a lung nodule surveillance scenario. Superior reconstruction accuracy and image quality were demonstrated using the dPIRPLE algorithm as compared to more traditional reconstruction methods including filtered backprojection, penalized-likelihood estimation (PLE), prior image penalized-likelihood estimation (PIPLE) without registration, and prior image penalized-likelihood estimation with rigid registration of a prior image (PIRPLE) over a wide range of sampling sparsity and exposure levels.
Hyperspectral image reconstruction for x-ray fluorescence tomography
Gürsoy, Doǧa; Biçer, Tekin; Lanzirotti, Antonio; ...
2015-01-01
A penalized maximum-likelihood estimation is proposed to perform hyperspectral (spatio-spectral) image reconstruction for X-ray fluorescence tomography. The approach minimizes a Poisson-based negative log-likelihood of the observed photon counts, and uses a penalty term that has the effect of encouraging local continuity of model parameter estimates in both spatial and spectral dimensions simultaneously. The performance of the reconstruction method is demonstrated with experimental data acquired from a seed of arabidopsis thaliana collected at the 13-ID-E microprobe beamline at the Advanced Photon Source. The resulting element distribution estimates with the proposed approach show significantly better reconstruction quality than the conventional analytical inversionmore » approaches, and allows for a high data compression factor which can reduce data acquisition times remarkably. In particular, this technique provides the capability to tomographically reconstruct full energy dispersive spectra without compromising reconstruction artifacts that impact the interpretation of results.« less
Maximum Likelihood Reconstruction for Magnetic Resonance Fingerprinting
Zhao, Bo; Setsompop, Kawin; Ye, Huihui; Cauley, Stephen; Wald, Lawrence L.
2017-01-01
This paper introduces a statistical estimation framework for magnetic resonance (MR) fingerprinting, a recently proposed quantitative imaging paradigm. Within this framework, we present a maximum likelihood (ML) formalism to estimate multiple parameter maps directly from highly undersampled, noisy k-space data. A novel algorithm, based on variable splitting, the alternating direction method of multipliers, and the variable projection method, is developed to solve the resulting optimization problem. Representative results from both simulations and in vivo experiments demonstrate that the proposed approach yields significantly improved accuracy in parameter estimation, compared to the conventional MR fingerprinting reconstruction. Moreover, the proposed framework provides new theoretical insights into the conventional approach. We show analytically that the conventional approach is an approximation to the ML reconstruction; more precisely, it is exactly equivalent to the first iteration of the proposed algorithm for the ML reconstruction, provided that a gridding reconstruction is used as an initialization. PMID:26915119
Maximum Likelihood Reconstruction for Magnetic Resonance Fingerprinting.
Zhao, Bo; Setsompop, Kawin; Ye, Huihui; Cauley, Stephen F; Wald, Lawrence L
2016-08-01
This paper introduces a statistical estimation framework for magnetic resonance (MR) fingerprinting, a recently proposed quantitative imaging paradigm. Within this framework, we present a maximum likelihood (ML) formalism to estimate multiple MR tissue parameter maps directly from highly undersampled, noisy k-space data. A novel algorithm, based on variable splitting, the alternating direction method of multipliers, and the variable projection method, is developed to solve the resulting optimization problem. Representative results from both simulations and in vivo experiments demonstrate that the proposed approach yields significantly improved accuracy in parameter estimation, compared to the conventional MR fingerprinting reconstruction. Moreover, the proposed framework provides new theoretical insights into the conventional approach. We show analytically that the conventional approach is an approximation to the ML reconstruction; more precisely, it is exactly equivalent to the first iteration of the proposed algorithm for the ML reconstruction, provided that a gridding reconstruction is used as an initialization.
A composite likelihood approach for spatially correlated survival data
Paik, Jane; Ying, Zhiliang
2013-01-01
The aim of this paper is to provide a composite likelihood approach to handle spatially correlated survival data using pairwise joint distributions. With e-commerce data, a recent question of interest in marketing research has been to describe spatially clustered purchasing behavior and to assess whether geographic distance is the appropriate metric to describe purchasing dependence. We present a model for the dependence structure of time-to-event data subject to spatial dependence to characterize purchasing behavior from the motivating example from e-commerce data. We assume the Farlie-Gumbel-Morgenstern (FGM) distribution and then model the dependence parameter as a function of geographic and demographic pairwise distances. For estimation of the dependence parameters, we present pairwise composite likelihood equations. We prove that the resulting estimators exhibit key properties of consistency and asymptotic normality under certain regularity conditions in the increasing-domain framework of spatial asymptotic theory. PMID:24223450
A composite likelihood approach for spatially correlated survival data.
Paik, Jane; Ying, Zhiliang
2013-01-01
The aim of this paper is to provide a composite likelihood approach to handle spatially correlated survival data using pairwise joint distributions. With e-commerce data, a recent question of interest in marketing research has been to describe spatially clustered purchasing behavior and to assess whether geographic distance is the appropriate metric to describe purchasing dependence. We present a model for the dependence structure of time-to-event data subject to spatial dependence to characterize purchasing behavior from the motivating example from e-commerce data. We assume the Farlie-Gumbel-Morgenstern (FGM) distribution and then model the dependence parameter as a function of geographic and demographic pairwise distances. For estimation of the dependence parameters, we present pairwise composite likelihood equations. We prove that the resulting estimators exhibit key properties of consistency and asymptotic normality under certain regularity conditions in the increasing-domain framework of spatial asymptotic theory.
A time series intervention analysis (TSIA) of dendrochronological data to infer the tree growth-climate-disturbance relations and forest disturbance history is described. Maximum likelihood is used to estimate the parameters of a structural time series model with components for ...
Robust Methods for Moderation Analysis with a Two-Level Regression Model.
Yang, Miao; Yuan, Ke-Hai
2016-01-01
Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
Development of advanced techniques for rotorcraft state estimation and parameter identification
NASA Technical Reports Server (NTRS)
Hall, W. E., Jr.; Bohn, J. G.; Vincent, J. H.
1980-01-01
An integrated methodology for rotorcraft system identification consists of rotorcraft mathematical modeling, three distinct data processing steps, and a technique for designing inputs to improve the identifiability of the data. These elements are as follows: (1) a Kalman filter smoother algorithm which estimates states and sensor errors from error corrupted data. Gust time histories and statistics may also be estimated; (2) a model structure estimation algorithm for isolating a model which adequately explains the data; (3) a maximum likelihood algorithm for estimating the parameters and estimates for the variance of these estimates; and (4) an input design algorithm, based on a maximum likelihood approach, which provides inputs to improve the accuracy of parameter estimates. Each step is discussed with examples to both flight and simulated data cases.
NASA Astrophysics Data System (ADS)
Cheng, Qin-Bo; Chen, Xi; Xu, Chong-Yu; Reinhardt-Imjela, Christian; Schulte, Achim
2014-11-01
In this study, the likelihood functions for uncertainty analysis of hydrological models are compared and improved through the following steps: (1) the equivalent relationship between the Nash-Sutcliffe Efficiency coefficient (NSE) and the likelihood function with Gaussian independent and identically distributed residuals is proved; (2) a new estimation method of the Box-Cox transformation (BC) parameter is developed to improve the effective elimination of the heteroscedasticity of model residuals; and (3) three likelihood functions-NSE, Generalized Error Distribution with BC (BC-GED) and Skew Generalized Error Distribution with BC (BC-SGED)-are applied for SWAT-WB-VSA (Soil and Water Assessment Tool - Water Balance - Variable Source Area) model calibration in the Baocun watershed, Eastern China. Performances of calibrated models are compared using the observed river discharges and groundwater levels. The result shows that the minimum variance constraint can effectively estimate the BC parameter. The form of the likelihood function significantly impacts on the calibrated parameters and the simulated results of high and low flow components. SWAT-WB-VSA with the NSE approach simulates flood well, but baseflow badly owing to the assumption of Gaussian error distribution, where the probability of the large error is low, but the small error around zero approximates equiprobability. By contrast, SWAT-WB-VSA with the BC-GED or BC-SGED approach mimics baseflow well, which is proved in the groundwater level simulation. The assumption of skewness of the error distribution may be unnecessary, because all the results of the BC-SGED approach are nearly the same as those of the BC-GED approach.
Testing students' e-learning via Facebook through Bayesian structural equation modeling.
Salarzadeh Jenatabadi, Hashem; Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students' intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods' results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated.
Testing students’ e-learning via Facebook through Bayesian structural equation modeling
Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students’ intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods’ results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated. PMID:28886019
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
He, Ye; Lin, Huazhen; Tu, Dongsheng
2018-06-04
In this paper, we introduce a single-index threshold Cox proportional hazard model to select and combine biomarkers to identify patients who may be sensitive to a specific treatment. A penalized smoothed partial likelihood is proposed to estimate the parameters in the model. A simple, efficient, and unified algorithm is presented to maximize this likelihood function. The estimators based on this likelihood function are shown to be consistent and asymptotically normal. Under mild conditions, the proposed estimators also achieve the oracle property. The proposed approach is evaluated through simulation analyses and application to the analysis of data from two clinical trials, one involving patients with locally advanced or metastatic pancreatic cancer and one involving patients with resectable lung cancer. Copyright © 2018 John Wiley & Sons, Ltd.
Maximum Likelihood Dynamic Factor Modeling for Arbitrary "N" and "T" Using SEM
ERIC Educational Resources Information Center
Voelkle, Manuel C.; Oud, Johan H. L.; von Oertzen, Timo; Lindenberger, Ulman
2012-01-01
This article has 3 objectives that build on each other. First, we demonstrate how to obtain maximum likelihood estimates for dynamic factor models (the direct autoregressive factor score model) with arbitrary "T" and "N" by means of structural equation modeling (SEM) and compare the approach to existing methods. Second, we go beyond standard time…
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Kendall, W.L.; Nichols, J.D.; Hines, J.E.
1997-01-01
Statistical inference for capture-recapture studies of open animal populations typically relies on the assumption that all emigration from the studied population is permanent. However, there are many instances in which this assumption is unlikely to be met. We define two general models for the process of temporary emigration, completely random and Markovian. We then consider effects of these two types of temporary emigration on Jolly-Seber (Seber 1982) estimators and on estimators arising from the full-likelihood approach of Kendall et al. (1995) to robust design data. Capture-recapture data arising from Pollock's (1982) robust design provide the basis for obtaining unbiased estimates of demographic parameters in the presence of temporary emigration and for estimating the probability of temporary emigration. We present a likelihood-based approach to dealing with temporary emigration that permits estimation under different models of temporary emigration and yields tests for completely random and Markovian emigration. In addition, we use the relationship between capture probability estimates based on closed and open models under completely random temporary emigration to derive three ad hoc estimators for the probability of temporary emigration, two of which should be especially useful in situations where capture probabilities are heterogeneous among individual animals. Ad hoc and full-likelihood estimators are illustrated for small mammal capture-recapture data sets. We believe that these models and estimators will be useful for testing hypotheses about the process of temporary emigration, for estimating demographic parameters in the presence of temporary emigration, and for estimating probabilities of temporary emigration. These latter estimates are frequently of ecological interest as indicators of animal movement and, in some sampling situations, as direct estimates of breeding probabilities and proportions.
Characterization, parameter estimation, and aircraft response statistics of atmospheric turbulence
NASA Technical Reports Server (NTRS)
Mark, W. D.
1981-01-01
A nonGaussian three component model of atmospheric turbulence is postulated that accounts for readily observable features of turbulence velocity records, their autocorrelation functions, and their spectra. Methods for computing probability density functions and mean exceedance rates of a generic aircraft response variable are developed using nonGaussian turbulence characterizations readily extracted from velocity recordings. A maximum likelihood method is developed for optimal estimation of the integral scale and intensity of records possessing von Karman transverse of longitudinal spectra. Formulas for the variances of such parameter estimates are developed. The maximum likelihood and least-square approaches are combined to yield a method for estimating the autocorrelation function parameters of a two component model for turbulence.
Models and analysis for multivariate failure time data
NASA Astrophysics Data System (ADS)
Shih, Joanna Huang
The goal of this research is to develop and investigate models and analytic methods for multivariate failure time data. We compare models in terms of direct modeling of the margins, flexibility of dependency structure, local vs. global measures of association, and ease of implementation. In particular, we study copula models, and models produced by right neutral cumulative hazard functions and right neutral hazard functions. We examine the changes of association over time for families of bivariate distributions induced from these models by displaying their density contour plots, conditional density plots, correlation curves of Doksum et al, and local cross ratios of Oakes. We know that bivariate distributions with same margins might exhibit quite different dependency structures. In addition to modeling, we study estimation procedures. For copula models, we investigate three estimation procedures. the first procedure is full maximum likelihood. The second procedure is two-stage maximum likelihood. At stage 1, we estimate the parameters in the margins by maximizing the marginal likelihood. At stage 2, we estimate the dependency structure by fixing the margins at the estimated ones. The third procedure is two-stage partially parametric maximum likelihood. It is similar to the second procedure, but we estimate the margins by the Kaplan-Meier estimate. We derive asymptotic properties for these three estimation procedures and compare their efficiency by Monte-Carlo simulations and direct computations. For models produced by right neutral cumulative hazards and right neutral hazards, we derive the likelihood and investigate the properties of the maximum likelihood estimates. Finally, we develop goodness of fit tests for the dependency structure in the copula models. We derive a test statistic and its asymptotic properties based on the test of homogeneity of Zelterman and Chen (1988), and a graphical diagnostic procedure based on the empirical Bayes approach. We study the performance of these two methods using actual and computer generated data.
Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2014-02-01
Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.
Wald Sequential Probability Ratio Test for Analysis of Orbital Conjunction Data
NASA Technical Reports Server (NTRS)
Carpenter, J. Russell; Markley, F. Landis; Gold, Dara
2013-01-01
We propose a Wald Sequential Probability Ratio Test for analysis of commonly available predictions associated with spacecraft conjunctions. Such predictions generally consist of a relative state and relative state error covariance at the time of closest approach, under the assumption that prediction errors are Gaussian. We show that under these circumstances, the likelihood ratio of the Wald test reduces to an especially simple form, involving the current best estimate of collision probability, and a similar estimate of collision probability that is based on prior assumptions about the likelihood of collision.
NASA Astrophysics Data System (ADS)
De Santis, Alberto; Dellepiane, Umberto; Lucidi, Stefano
2012-11-01
In this paper we investigate the estimation problem for a model of the commodity prices. This model is a stochastic state space dynamical model and the problem unknowns are the state variables and the system parameters. Data are represented by the commodity spot prices, very seldom time series of Futures contracts are available for free. Both the system joint likelihood function (state variables and parameters) and the system marginal likelihood (the state variables are eliminated) function are addressed.
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Zeng, Lingzao; Chen, Cheng; Chen, Dingjiang; Wu, Laosheng
2015-01-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameters identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from concentration measurements in identifying unknown parameters. In this approach, the sampling locations that give the maximum expected relative entropy are selected as the optimal design. After the sampling locations are determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport equation. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. It is shown that the methods can be used to assist in both single sampling location and monitoring network design for contaminant source identifications in groundwater.
NASA Astrophysics Data System (ADS)
Ben Abdessalem, Anis; Dervilis, Nikolaos; Wagg, David; Worden, Keith
2018-01-01
This paper will introduce the use of the approximate Bayesian computation (ABC) algorithm for model selection and parameter estimation in structural dynamics. ABC is a likelihood-free method typically used when the likelihood function is either intractable or cannot be approached in a closed form. To circumvent the evaluation of the likelihood function, simulation from a forward model is at the core of the ABC algorithm. The algorithm offers the possibility to use different metrics and summary statistics representative of the data to carry out Bayesian inference. The efficacy of the algorithm in structural dynamics is demonstrated through three different illustrative examples of nonlinear system identification: cubic and cubic-quintic models, the Bouc-Wen model and the Duffing oscillator. The obtained results suggest that ABC is a promising alternative to deal with model selection and parameter estimation issues, specifically for systems with complex behaviours.
Likelihood parameter estimation for calibrating a soil moisture using radar backscatter
USDA-ARS?s Scientific Manuscript database
Assimilating soil moisture information contained in synthetic aperture radar imagery into land surface model predictions can be done using a calibration, or parameter estimation, approach. The presence of speckle, however, necessitates aggregating backscatter measurements over large land areas in or...
Zeng, Chan; Newcomer, Sophia R; Glanz, Jason M; Shoup, Jo Ann; Daley, Matthew F; Hambidge, Simon J; Xu, Stanley
2013-12-15
The self-controlled case series (SCCS) method is often used to examine the temporal association between vaccination and adverse events using only data from patients who experienced such events. Conditional Poisson regression models are used to estimate incidence rate ratios, and these models perform well with large or medium-sized case samples. However, in some vaccine safety studies, the adverse events studied are rare and the maximum likelihood estimates may be biased. Several bias correction methods have been examined in case-control studies using conditional logistic regression, but none of these methods have been evaluated in studies using the SCCS design. In this study, we used simulations to evaluate 2 bias correction approaches-the Firth penalized maximum likelihood method and Cordeiro and McCullagh's bias reduction after maximum likelihood estimation-with small sample sizes in studies using the SCCS design. The simulations showed that the bias under the SCCS design with a small number of cases can be large and is also sensitive to a short risk period. The Firth correction method provides finite and less biased estimates than the maximum likelihood method and Cordeiro and McCullagh's method. However, limitations still exist when the risk period in the SCCS design is short relative to the entire observation period.
Precision Parameter Estimation and Machine Learning
NASA Astrophysics Data System (ADS)
Wandelt, Benjamin D.
2008-12-01
I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.
GNSS Spoofing Detection and Mitigation Based on Maximum Likelihood Estimation
Li, Hong; Lu, Mingquan
2017-01-01
Spoofing attacks are threatening the global navigation satellite system (GNSS). The maximum likelihood estimation (MLE)-based positioning technique is a direct positioning method originally developed for multipath rejection and weak signal processing. We find this method also has a potential ability for GNSS anti-spoofing since a spoofing attack that misleads the positioning and timing result will cause distortion to the MLE cost function. Based on the method, an estimation-cancellation approach is presented to detect spoofing attacks and recover the navigation solution. A statistic is derived for spoofing detection with the principle of the generalized likelihood ratio test (GLRT). Then, the MLE cost function is decomposed to further validate whether the navigation solution obtained by MLE-based positioning is formed by consistent signals. Both formulae and simulations are provided to evaluate the anti-spoofing performance. Experiments with recordings in real GNSS spoofing scenarios are also performed to validate the practicability of the approach. Results show that the method works even when the code phase differences between the spoofing and authentic signals are much less than one code chip, which can improve the availability of GNSS service greatly under spoofing attacks. PMID:28665318
GNSS Spoofing Detection and Mitigation Based on Maximum Likelihood Estimation.
Wang, Fei; Li, Hong; Lu, Mingquan
2017-06-30
Spoofing attacks are threatening the global navigation satellite system (GNSS). The maximum likelihood estimation (MLE)-based positioning technique is a direct positioning method originally developed for multipath rejection and weak signal processing. We find this method also has a potential ability for GNSS anti-spoofing since a spoofing attack that misleads the positioning and timing result will cause distortion to the MLE cost function. Based on the method, an estimation-cancellation approach is presented to detect spoofing attacks and recover the navigation solution. A statistic is derived for spoofing detection with the principle of the generalized likelihood ratio test (GLRT). Then, the MLE cost function is decomposed to further validate whether the navigation solution obtained by MLE-based positioning is formed by consistent signals. Both formulae and simulations are provided to evaluate the anti-spoofing performance. Experiments with recordings in real GNSS spoofing scenarios are also performed to validate the practicability of the approach. Results show that the method works even when the code phase differences between the spoofing and authentic signals are much less than one code chip, which can improve the availability of GNSS service greatly under spoofing attacks.
Instructor perceptions of the accident likelihood faced by recently trained glider pilots.
Jarvis, Steve; Harris, Don
2011-12-01
U.K. glider pilots with less than 10 h of solo flying time have been shown to have the highest accident rate and be most vulnerable to accidents during the 'final approach' phase. There were 58 gliding instructors who were asked to indicate what experience level they thought was associated with the highest accident rate and provide the reason behind their estimate. They were also asked to rank six flight phases by the relative probability of accidents to inexperienced pilots. The mean estimate for the accident peak was 296.3 h as pilot-in-command (SD = 337.9) with no instructor giving a figure of less than 10 h. Common reasons for these estimates were 'over-confidence', 'risk-taking', or 'complacency'. Instructors also ranked six flight phases by the likelihood of an accident being caused by inexperienced pilots during that phase. Despite the approach phase having the highest objective accident probability, it was only ranked fifth by instructors, indicating an underestimate of the danger it presents to newly trained pilots. The results suggest that instructors do not appreciate the high accident likelihood of early solo pilots or the main dangers they face. This has implications for the decisions made when sending pilots solo.
Li, Xiang; Kuk, Anthony Y C; Xu, Jinfeng
2014-12-10
Human biomonitoring of exposure to environmental chemicals is important. Individual monitoring is not viable because of low individual exposure level or insufficient volume of materials and the prohibitive cost of taking measurements from many subjects. Pooling of samples is an efficient and cost-effective way to collect data. Estimation is, however, complicated as individual values within each pool are not observed but are only known up to their average or weighted average. The distribution of such averages is intractable when the individual measurements are lognormally distributed, which is a common assumption. We propose to replace the intractable distribution of the pool averages by a Gaussian likelihood to obtain parameter estimates. If the pool size is large, this method produces statistically efficient estimates, but regardless of pool size, the method yields consistent estimates as the number of pools increases. An empirical Bayes (EB) Gaussian likelihood approach, as well as its Bayesian analog, is developed to pool information from various demographic groups by using a mixed-effect formulation. We also discuss methods to estimate the underlying mean-variance relationship and to select a good model for the means, which can be incorporated into the proposed EB or Bayes framework. By borrowing strength across groups, the EB estimator is more efficient than the individual group-specific estimator. Simulation results show that the EB Gaussian likelihood estimates outperform a previous method proposed for the National Health and Nutrition Examination Surveys with much smaller bias and better coverage in interval estimation, especially after correction of bias. Copyright © 2014 John Wiley & Sons, Ltd.
Dziak, John J.; Bray, Bethany C.; Zhang, Jieting; Zhang, Minqiang; Lanza, Stephanie T.
2016-01-01
Several approaches are available for estimating the relationship of latent class membership to distal outcomes in latent profile analysis (LPA). A three-step approach is commonly used, but has problems with estimation bias and confidence interval coverage. Proposed improvements include the correction method of Bolck, Croon, and Hagenaars (BCH; 2004), Vermunt’s (2010) maximum likelihood (ML) approach, and the inclusive three-step approach of Bray, Lanza, & Tan (2015). These methods have been studied in the related case of latent class analysis (LCA) with categorical indicators, but not as well studied for LPA with continuous indicators. We investigated the performance of these approaches in LPA with normally distributed indicators, under different conditions of distal outcome distribution, class measurement quality, relative latent class size, and strength of association between latent class and the distal outcome. The modified BCH implemented in Latent GOLD had excellent performance. The maximum likelihood and inclusive approaches were not robust to violations of distributional assumptions. These findings broadly agree with and extend the results presented by Bakk and Vermunt (2016) in the context of LCA with categorical indicators. PMID:28630602
Kimura, Akatsuki; Celani, Antonio; Nagao, Hiromichi; Stasevich, Timothy; Nakamura, Kazuyuki
2015-01-01
Construction of quantitative models is a primary goal of quantitative biology, which aims to understand cellular and organismal phenomena in a quantitative manner. In this article, we introduce optimization procedures to search for parameters in a quantitative model that can reproduce experimental data. The aim of optimization is to minimize the sum of squared errors (SSE) in a prediction or to maximize likelihood. A (local) maximum of likelihood or (local) minimum of the SSE can efficiently be identified using gradient approaches. Addition of a stochastic process enables us to identify the global maximum/minimum without becoming trapped in local maxima/minima. Sampling approaches take advantage of increasing computational power to test numerous sets of parameters in order to determine the optimum set. By combining Bayesian inference with gradient or sampling approaches, we can estimate both the optimum parameters and the form of the likelihood function related to the parameters. Finally, we introduce four examples of research that utilize parameter optimization to obtain biological insights from quantified data: transcriptional regulation, bacterial chemotaxis, morphogenesis, and cell cycle regulation. With practical knowledge of parameter optimization, cell and developmental biologists can develop realistic models that reproduce their observations and thus, obtain mechanistic insights into phenomena of interest.
Pritikin, Joshua N; Brick, Timothy R; Neale, Michael C
2018-04-01
A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.
Bayesian experimental design for models with intractable likelihoods.
Drovandi, Christopher C; Pettitt, Anthony N
2013-12-01
In this paper we present a methodology for designing experiments for efficiently estimating the parameters of models with computationally intractable likelihoods. The approach combines a commonly used methodology for robust experimental design, based on Markov chain Monte Carlo sampling, with approximate Bayesian computation (ABC) to ensure that no likelihood evaluations are required. The utility function considered for precise parameter estimation is based upon the precision of the ABC posterior distribution, which we form efficiently via the ABC rejection algorithm based on pre-computed model simulations. Our focus is on stochastic models and, in particular, we investigate the methodology for Markov process models of epidemics and macroparasite population evolution. The macroparasite example involves a multivariate process and we assess the loss of information from not observing all variables. © 2013, The International Biometric Society.
Vehicle Sprung Mass Estimation for Rough Terrain
2011-03-01
distributions are greater than zero. The multivariate polynomials are functions of the Legendre polynomials (Poularikas (1999...developed methods based on polynomial chaos theory and on the maximum likelihood approach to estimate the most likely value of the vehicle sprung...mass. The polynomial chaos estimator is compared to benchmark algorithms including recursive least squares, recursive total least squares, extended
The Least-Squares Estimation of Latent Trait Variables.
ERIC Educational Resources Information Center
Tatsuoka, Kikumi
This paper presents a new method for estimating a given latent trait variable by the least-squares approach. The beta weights are obtained recursively with the help of Fourier series and expressed as functions of item parameters of response curves. The values of the latent trait variable estimated by this method and by maximum likelihood method…
Robust inference in the negative binomial regression model with an application to falls data.
Aeberhard, William H; Cantoni, Eva; Heritier, Stephane
2014-12-01
A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.
MCMC multilocus lod scores: application of a new approach.
George, Andrew W; Wijsman, Ellen M; Thompson, Elizabeth A
2005-01-01
On extended pedigrees with extensive missing data, the calculation of multilocus likelihoods for linkage analysis is often beyond the computational bounds of exact methods. Growing interest therefore surrounds the implementation of Monte Carlo estimation methods. In this paper, we demonstrate the speed and accuracy of a new Markov chain Monte Carlo method for the estimation of linkage likelihoods through an analysis of real data from a study of early-onset Alzheimer's disease. For those data sets where comparison with exact analysis is possible, we achieved up to a 100-fold increase in speed. Our approach is implemented in the program lm_bayes within the framework of the freely available MORGAN 2.6 package for Monte Carlo genetic analysis (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).
Parameter Estimation and Model Selection for Indoor Environments Based on Sparse Observations
NASA Astrophysics Data System (ADS)
Dehbi, Y.; Loch-Dehbi, S.; Plümer, L.
2017-09-01
This paper presents a novel method for the parameter estimation and model selection for the reconstruction of indoor environments based on sparse observations. While most approaches for the reconstruction of indoor models rely on dense observations, we predict scenes of the interior with high accuracy in the absence of indoor measurements. We use a model-based top-down approach and incorporate strong but profound prior knowledge. The latter includes probability density functions for model parameters and sparse observations such as room areas and the building footprint. The floorplan model is characterized by linear and bi-linear relations with discrete and continuous parameters. We focus on the stochastic estimation of model parameters based on a topological model derived by combinatorial reasoning in a first step. A Gauss-Markov model is applied for estimation and simulation of the model parameters. Symmetries are represented and exploited during the estimation process. Background knowledge as well as observations are incorporated in a maximum likelihood estimation and model selection is performed with AIC/BIC. The likelihood is also used for the detection and correction of potential errors in the topological model. Estimation results are presented and discussed.
Maximum likelihood estimation for life distributions with competing failure modes
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1979-01-01
Systems which are placed on test at time zero, function for a period and die at some random time were studied. Failure may be due to one of several causes or modes. The parameters of the life distribution may depend upon the levels of various stress variables the item is subject to. Maximum likelihood estimation methods are discussed. Specific methods are reported for the smallest extreme-value distributions of life. Monte-Carlo results indicate the methods to be promising. Under appropriate conditions, the location parameters are nearly unbiased, the scale parameter is slight biased, and the asymptotic covariances are rapidly approached.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Fulton, Kara A.; Liu, Danping; Haynie, Denise L.; Albert, Paul S.
2016-01-01
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian–Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored. PMID:26937263
Ng, S K; McLachlan, G J
2003-04-15
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright 2003 John Wiley & Sons, Ltd.
Hudson, H M; Ma, J; Green, P
1994-01-01
Many algorithms for medical image reconstruction adopt versions of the expectation-maximization (EM) algorithm. In this approach, parameter estimates are obtained which maximize a complete data likelihood or penalized likelihood, in each iteration. Implicitly (and sometimes explicitly) penalized algorithms require smoothing of the current reconstruction in the image domain as part of their iteration scheme. In this paper, we discuss alternatives to EM which adapt Fisher's method of scoring (FS) and other methods for direct maximization of the incomplete data likelihood. Jacobi and Gauss-Seidel methods for non-linear optimization provide efficient algorithms applying FS in tomography. One approach uses smoothed projection data in its iterations. We investigate the convergence of Jacobi and Gauss-Seidel algorithms with clinical tomographic projection data.
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
Sources of Biased Inference in Alcohol and Drug Services Research: An Instrumental Variable Approach
Schmidt, Laura A.; Tam, Tammy W.; Larson, Mary Jo
2012-01-01
Objective: This study examined the potential for biased inference due to endogeneity when using standard approaches for modeling the utilization of alcohol and drug treatment. Method: Results from standard regression analysis were compared with those that controlled for endogeneity using instrumental variables estimation. Comparable models predicted the likelihood of receiving alcohol treatment based on the widely used Aday and Andersen medical care–seeking model. Data were from the National Epidemiologic Survey on Alcohol and Related Conditions and included a representative sample of adults in households and group quarters throughout the contiguous United States. Results: Findings suggested that standard approaches for modeling treatment utilization are prone to bias because of uncontrolled reverse causation and omitted variables. Compared with instrumental variables estimation, standard regression analyses produced downwardly biased estimates of the impact of alcohol problem severity on the likelihood of receiving care. Conclusions: Standard approaches for modeling service utilization are prone to underestimating the true effects of problem severity on service use. Biased inference could lead to inaccurate policy recommendations, for example, by suggesting that people with milder forms of substance use disorder are more likely to receive care than is actually the case. PMID:22152672
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.
Program for Weibull Analysis of Fatigue Data
NASA Technical Reports Server (NTRS)
Krantz, Timothy L.
2005-01-01
A Fortran computer program has been written for performing statistical analyses of fatigue-test data that are assumed to be adequately represented by a two-parameter Weibull distribution. This program calculates the following: (1) Maximum-likelihood estimates of the Weibull distribution; (2) Data for contour plots of relative likelihood for two parameters; (3) Data for contour plots of joint confidence regions; (4) Data for the profile likelihood of the Weibull-distribution parameters; (5) Data for the profile likelihood of any percentile of the distribution; and (6) Likelihood-based confidence intervals for parameters and/or percentiles of the distribution. The program can account for tests that are suspended without failure (the statistical term for such suspension of tests is "censoring"). The analytical approach followed in this program for the software is valid for type-I censoring, which is the removal of unfailed units at pre-specified times. Confidence regions and intervals are calculated by use of the likelihood-ratio method.
USDA-ARS?s Scientific Manuscript database
Data assimilation and regression are two commonly used methods for predicting agricultural yield from remote sensing observations. Data assimilation is a generative approach because it requires explicit approximations of the Bayesian prior and likelihood to compute the probability density function...
A Statistical Approach to Passive Target Tracking.
1981-04-01
a fixed heading of 90 degrees. For 7F. A. Graybill , An Introduction to Linear Statistical Models , Vol. 1, New York: John Wiley&-Sons -Inc. (1961). 13...likelihood estimators. 12 NCSC TM 311-81 The adjustment for a changing error variance is easy using the linear model approach; i.e., use weighted
Joint sparsity based heterogeneous data-level fusion for target detection and estimation
NASA Astrophysics Data System (ADS)
Niu, Ruixin; Zulch, Peter; Distasio, Marcello; Blasch, Erik; Shen, Dan; Chen, Genshe
2017-05-01
Typical surveillance systems employ decision- or feature-level fusion approaches to integrate heterogeneous sensor data, which are sub-optimal and incur information loss. In this paper, we investigate data-level heterogeneous sensor fusion. Since the sensors monitor the common targets of interest, whose states can be determined by only a few parameters, it is reasonable to assume that the measurement domain has a low intrinsic dimensionality. For heterogeneous sensor data, we develop a joint-sparse data-level fusion (JSDLF) approach based on the emerging joint sparse signal recovery techniques by discretizing the target state space. This approach is applied to fuse signals from multiple distributed radio frequency (RF) signal sensors and a video camera for joint target detection and state estimation. The JSDLF approach is data-driven and requires minimum prior information, since there is no need to know the time-varying RF signal amplitudes, or the image intensity of the targets. It can handle non-linearity in the sensor data due to state space discretization and the use of frequency/pixel selection matrices. Furthermore, for a multi-target case with J targets, the JSDLF approach only requires discretization in a single-target state space, instead of discretization in a J-target state space, as in the case of the generalized likelihood ratio test (GLRT) or the maximum likelihood estimator (MLE). Numerical examples are provided to demonstrate that the proposed JSDLF approach achieves excellent performance with near real-time accurate target position and velocity estimates.
ERIC Educational Resources Information Center
DeSarbo, Wayne S.; Park, Joonwook; Scott, Crystal J.
2008-01-01
A cyclical conditional maximum likelihood estimation procedure is developed for the multidimensional unfolding of two- or three-way dominance data (e.g., preference, choice, consideration) measured on ordered successive category rating scales. The technical description of the proposed model and estimation procedure are discussed, as well as the…
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
Zou, W; Ouyang, H
2016-02-01
We propose a multiple estimation adjustment (MEA) method to correct effect overestimation due to selection bias from a hypothesis-generating study (HGS) in pharmacogenetics. MEA uses a hierarchical Bayesian approach to model individual effect estimates from maximal likelihood estimation (MLE) in a region jointly and shrinks them toward the regional effect. Unlike many methods that model a fixed selection scheme, MEA capitalizes on local multiplicity independent of selection. We compared mean square errors (MSEs) in simulated HGSs from naive MLE, MEA and a conditional likelihood adjustment (CLA) method that model threshold selection bias. We observed that MEA effectively reduced MSE from MLE on null effects with or without selection, and had a clear advantage over CLA on extreme MLE estimates from null effects under lenient threshold selection in small samples, which are common among 'top' associations from a pharmacogenetics HGS.
A Game Theoretical Approach to Hacktivism: Is Attack Likelihood a Product of Risks and Payoffs?
Bodford, Jessica E; Kwan, Virginia S Y
2018-02-01
The current study examines hacktivism (i.e., hacking to convey a moral, ethical, or social justice message) through a general game theoretic framework-that is, as a product of costs and benefits. Given the inherent risk of carrying out a hacktivist attack (e.g., legal action, imprisonment), it would be rational for the user to weigh these risks against perceived benefits of carrying out the attack. As such, we examined computer science students' estimations of risks, payoffs, and attack likelihood through a game theoretic design. Furthermore, this study aims at constructing a descriptive profile of potential hacktivists, exploring two predicted covariates of attack decision making, namely, peer prevalence of hacking and sex differences. Contrary to expectations, results suggest that participants' estimations of attack likelihood stemmed solely from expected payoffs, rather than subjective risks. Peer prevalence significantly predicted increased payoffs and attack likelihood, suggesting an underlying descriptive norm in social networks. Notably, we observed no sex differences in the decision to attack, nor in the factors predicting attack likelihood. Implications for policymakers and the understanding and prevention of hacktivism are discussed, as are the possible ramifications of widely communicated payoffs over potential risks in hacking communities.
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2008-01-01
The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…
NASA Astrophysics Data System (ADS)
Zeng, X.
2015-12-01
A large number of model executions are required to obtain alternative conceptual models' predictions and their posterior probabilities in Bayesian model averaging (BMA). The posterior model probability is estimated through models' marginal likelihood and prior probability. The heavy computation burden hinders the implementation of BMA prediction, especially for the elaborated marginal likelihood estimator. For overcoming the computation burden of BMA, an adaptive sparse grid (SG) stochastic collocation method is used to build surrogates for alternative conceptual models through the numerical experiment of a synthetical groundwater model. BMA predictions depend on model posterior weights (or marginal likelihoods), and this study also evaluated four marginal likelihood estimators, including arithmetic mean estimator (AME), harmonic mean estimator (HME), stabilized harmonic mean estimator (SHME), and thermodynamic integration estimator (TIE). The results demonstrate that TIE is accurate in estimating conceptual models' marginal likelihoods. The BMA-TIE has better predictive performance than other BMA predictions. TIE has high stability for estimating conceptual model's marginal likelihood. The repeated estimated conceptual model's marginal likelihoods by TIE have significant less variability than that estimated by other estimators. In addition, the SG surrogates are efficient to facilitate BMA predictions, especially for BMA-TIE. The number of model executions needed for building surrogates is 4.13%, 6.89%, 3.44%, and 0.43% of the required model executions of BMA-AME, BMA-HME, BMA-SHME, and BMA-TIE, respectively.
Establishment of a center of excellence for applied mathematical and statistical research
NASA Technical Reports Server (NTRS)
Woodward, W. A.; Gray, H. L.
1983-01-01
The state of the art was assessed with regards to efforts in support of the crop production estimation problem and alternative generic proportion estimation techniques were investigated. Topics covered include modeling the greeness profile (Badhwarmos model), parameter estimation using mixture models such as CLASSY, and minimum distance estimation as an alternative to maximum likelihood estimation. Approaches to the problem of obtaining proportion estimates when the underlying distributions are asymmetric are examined including the properties of Weibull distribution.
Using genetic data to estimate diffusion rates in heterogeneous landscapes.
Roques, L; Walker, E; Franck, P; Soubeyrand, S; Klein, E K
2016-08-01
Having a precise knowledge of the dispersal ability of a population in a heterogeneous environment is of critical importance in agroecology and conservation biology as it can provide management tools to limit the effects of pests or to increase the survival of endangered species. In this paper, we propose a mechanistic-statistical method to estimate space-dependent diffusion parameters of spatially-explicit models based on stochastic differential equations, using genetic data. Dividing the total population into subpopulations corresponding to different habitat patches with known allele frequencies, the expected proportions of individuals from each subpopulation at each position is computed by solving a system of reaction-diffusion equations. Modelling the capture and genotyping of the individuals with a statistical approach, we derive a numerically tractable formula for the likelihood function associated with the diffusion parameters. In a simulated environment made of three types of regions, each associated with a different diffusion coefficient, we successfully estimate the diffusion parameters with a maximum-likelihood approach. Although higher genetic differentiation among subpopulations leads to more accurate estimations, once a certain level of differentiation has been reached, the finite size of the genotyped population becomes the limiting factor for accurate estimation.
1990-11-01
1 = Q- 1 - 1 QlaaQ- 1.1 + a’Q-1a This is a simple case of a general formula called Woodbury’s formula by some authors; see, for example, Phadke and...1 2. The First-Order Moving Average Model ..... .................. 3. Some Approaches to the Iterative...the approximate likelihood function in some time series models. Useful suggestions have been the Cholesky decomposition of the covariance matrix and
Elghafghuf, Adel; Dufour, Simon; Reyher, Kristen; Dohoo, Ian; Stryhn, Henrik
2014-12-01
Mastitis is a complex disease affecting dairy cows and is considered to be the most costly disease of dairy herds. The hazard of mastitis is a function of many factors, both managerial and environmental, making its control a difficult issue to milk producers. Observational studies of clinical mastitis (CM) often generate datasets with a number of characteristics which influence the analysis of those data: the outcome of interest may be the time to occurrence of a case of mastitis, predictors may change over time (time-dependent predictors), the effects of factors may change over time (time-dependent effects), there are usually multiple hierarchical levels, and datasets may be very large. Analysis of such data often requires expansion of the data into the counting-process format - leading to larger datasets - thus complicating the analysis and requiring excessive computing time. In this study, a nested frailty Cox model with time-dependent predictors and effects was applied to Canadian Bovine Mastitis Research Network data in which 10,831 lactations of 8035 cows from 69 herds were followed through lactation until the first occurrence of CM. The model was fit to the data as a Poisson model with nested normally distributed random effects at the cow and herd levels. Risk factors associated with the hazard of CM during the lactation were identified, such as parity, calving season, herd somatic cell score, pasture access, fore-stripping, and proportion of treated cases of CM in a herd. The analysis showed that most of the predictors had a strong effect early in lactation and also demonstrated substantial variation in the baseline hazard among cows and between herds. A small simulation study for a setting similar to the real data was conducted to evaluate the Poisson maximum likelihood estimation approach with both Gaussian quadrature method and Laplace approximation. Further, the performance of the two methods was compared with the performance of a widely used estimation approach for frailty Cox models based on the penalized partial likelihood. The simulation study showed good performance for the Poisson maximum likelihood approach with Gaussian quadrature and biased variance component estimates for both the Poisson maximum likelihood with Laplace approximation and penalized partial likelihood approaches. Copyright © 2014. Published by Elsevier B.V.
Determining the accuracy of maximum likelihood parameter estimates with colored residuals
NASA Technical Reports Server (NTRS)
Morelli, Eugene A.; Klein, Vladislav
1994-01-01
An important part of building high fidelity mathematical models based on measured data is calculating the accuracy associated with statistical estimates of the model parameters. Indeed, without some idea of the accuracy of parameter estimates, the estimates themselves have limited value. In this work, an expression based on theoretical analysis was developed to properly compute parameter accuracy measures for maximum likelihood estimates with colored residuals. This result is important because experience from the analysis of measured data reveals that the residuals from maximum likelihood estimation are almost always colored. The calculations involved can be appended to conventional maximum likelihood estimation algorithms. Simulated data runs were used to show that the parameter accuracy measures computed with this technique accurately reflect the quality of the parameter estimates from maximum likelihood estimation without the need for analysis of the output residuals in the frequency domain or heuristically determined multiplication factors. The result is general, although the application studied here is maximum likelihood estimation of aerodynamic model parameters from flight test data.
A quantum framework for likelihood ratios
NASA Astrophysics Data System (ADS)
Bond, Rachael L.; He, Yang-Hui; Ormerod, Thomas C.
The ability to calculate precise likelihood ratios is fundamental to science, from Quantum Information Theory through to Quantum State Estimation. However, there is no assumption-free statistical methodology to achieve this. For instance, in the absence of data relating to covariate overlap, the widely used Bayes’ theorem either defaults to the marginal probability driven “naive Bayes’ classifier”, or requires the use of compensatory expectation-maximization techniques. This paper takes an information-theoretic approach in developing a new statistical formula for the calculation of likelihood ratios based on the principles of quantum entanglement, and demonstrates that Bayes’ theorem is a special case of a more general quantum mechanical expression.
Multilevel modeling of single-case data: A comparison of maximum likelihood and Bayesian estimation.
Moeyaert, Mariola; Rindskopf, David; Onghena, Patrick; Van den Noortgate, Wim
2017-12-01
The focus of this article is to describe Bayesian estimation, including construction of prior distributions, and to compare parameter recovery under the Bayesian framework (using weakly informative priors) and the maximum likelihood (ML) framework in the context of multilevel modeling of single-case experimental data. Bayesian estimation results were found similar to ML estimation results in terms of the treatment effect estimates, regardless of the functional form and degree of information included in the prior specification in the Bayesian framework. In terms of the variance component estimates, both the ML and Bayesian estimation procedures result in biased and less precise variance estimates when the number of participants is small (i.e., 3). By increasing the number of participants to 5 or 7, the relative bias is close to 5% and more precise estimates are obtained for all approaches, except for the inverse-Wishart prior using the identity matrix. When a more informative prior was added, more precise estimates for the fixed effects and random effects were obtained, even when only 3 participants were included. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The Hypothesis-Driven Physical Examination.
Garibaldi, Brian T; Olson, Andrew P J
2018-05-01
The physical examination remains a vital part of the clinical encounter. However, physical examination skills have declined in recent years, in part because of decreased time at the bedside. Many clinicians question the relevance of physical examinations in the age of technology. A hypothesis-driven approach to teaching and practicing the physical examination emphasizes the performance of maneuvers that can alter the likelihood of disease. Likelihood ratios are diagnostic weights that allow clinicians to estimate the post-probability of disease. This hypothesis-driven approach to the physical examination increases its value and efficiency, while preserving its cultural role in the patient-physician relationship. Copyright © 2017 Elsevier Inc. All rights reserved.
Markov Chain Monte Carlo Estimation of Item Parameters for the Generalized Graded Unfolding Model
ERIC Educational Resources Information Center
de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S.
2006-01-01
The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…
On the existence of maximum likelihood estimates for presence-only data
Hefley, Trevor J.; Hooten, Mevin B.
2015-01-01
It is important to identify conditions for which maximum likelihood estimates are unlikely to be identifiable from presence-only data. In data sets where the maximum likelihood estimates do not exist, penalized likelihood and Bayesian methods will produce coefficient estimates, but these are sensitive to the choice of estimation procedure and prior or penalty term. When sample size is small or it is thought that habitat preferences are strong, we propose a suite of estimation procedures researchers can consider using.
Maulidiani; Rudiyanto; Abas, Faridah; Ismail, Intan Safinar; Lajis, Nordin H
2018-06-01
Optimization process is an important aspect in the natural product extractions. Herein, an alternative approach is proposed for the optimization in extraction, namely, the Generalized Likelihood Uncertainty Estimation (GLUE). The approach combines the Latin hypercube sampling, the feasible range of independent variables, the Monte Carlo simulation, and the threshold criteria of response variables. The GLUE method is tested in three different techniques including the ultrasound, the microwave, and the supercritical CO 2 assisted extractions utilizing the data from previously published reports. The study found that this method can: provide more information on the combined effects of the independent variables on the response variables in the dotty plots; deal with unlimited number of independent and response variables; consider combined multiple threshold criteria, which is subjective depending on the target of the investigation for response variables; and provide a range of values with their distribution for the optimization. Copyright © 2018 Elsevier Ltd. All rights reserved.
Yap, John Stephen; Fan, Jianqing; Wu, Rongling
2009-12-01
Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.
Nichols, James D.; Hines, James E.
2002-01-01
We first consider the estimation of the finite rate of population increase or population growth rate, u i , using capture-recapture data from open populations. We review estimation and modelling of u i under three main approaches to modelling openpopulation data: the classic approach of Jolly (1965) and Seber (1965), the superpopulation approach of Crosbie & Manly (1985) and Schwarz & Arnason (1996), and the temporal symmetry approach of Pradel (1996). Next, we consider the contributions of different demographic components to u i using a probabilistic approach based on the composition of the population at time i + 1 (Nichols et al., 2000b). The parameters of interest are identical to the seniority parameters, n i , of Pradel (1996). We review estimation of n i under the classic, superpopulation, and temporal symmetry approaches. We then compare these direct estimation approaches for u i and n i with analogues computed using projection matrix asymptotics. We also discuss various extensions of the estimation approaches to multistate applications and to joint likelihoods involving multiple data types.
Nichols, J.D.; Hines, J.E.
2002-01-01
We first consider the estimation of the finite rate of population increase or population growth rate, lambda sub i, using capture-recapture data from open populations. We review estimation and modelling of lambda sub i under three main approaches to modelling open-population data: the classic approach of Jolly (1965) and Seber (1965), the superpopulation approach of Crosbie & Manly (1985) and Schwarz & Arnason (1996), and the temporal symmetry approach of Pradel (1996). Next, we consider the contributions of different demographic components to lambda sub i using a probabilistic approach based on the composition of the population at time i + 1 (Nichols et al., 2000b). The parameters of interest are identical to the seniority parameters, gamma sub i, of Pradel (1996). We review estimation of gamma sub i under the classic, superpopulation, and temporal symmetry approaches. We then compare these direct estimation approaches for lambda sub i and gamma sub i with analogues computed using projection matrix asymptotics. We also discuss various extensions of the estimation approaches to multistate applications and to joint likelihoods involving multiple data types.
Script-theory virtual case: A novel tool for education and research.
Hayward, Jake; Cheung, Amandy; Velji, Alkarim; Altarejos, Jenny; Gill, Peter; Scarfe, Andrew; Lewis, Melanie
2016-11-01
Context/Setting: The script theory of diagnostic reasoning proposes that clinicians evaluate cases in the context of an "illness script," iteratively testing internal hypotheses against new information eventually reaching a diagnosis. We present a novel tool for teaching diagnostic reasoning to undergraduate medical students based on an adaptation of script theory. We developed a virtual patient case that used clinically authentic audio and video, interactive three-dimensional (3D) body images, and a simulated electronic medical record. Next, we used interactive slide bars to record respondents' likelihood estimates of diagnostic possibilities at various stages of the case. Responses were dynamically compared to data from expert clinicians and peers. Comparative frequency distributions were presented to the learner and final diagnostic likelihood estimates were analyzed. Detailed student feedback was collected. Over two academic years, 322 students participated. Student diagnostic likelihood estimates were similar year to year, but were consistently different from expert clinician estimates. Student feedback was overwhelmingly positive: students found the case was novel, innovative, clinically authentic, and a valuable learning experience. We demonstrate the successful implementation of a novel approach to teaching diagnostic reasoning. Future study may delineate reasoning processes associated with differences between novice and expert responses.
Inverse Ising problem in continuous time: A latent variable approach
NASA Astrophysics Data System (ADS)
Donner, Christian; Opper, Manfred
2017-12-01
We consider the inverse Ising problem: the inference of network couplings from observed spin trajectories for a model with continuous time Glauber dynamics. By introducing two sets of auxiliary latent random variables we render the likelihood into a form which allows for simple iterative inference algorithms with analytical updates. The variables are (1) Poisson variables to linearize an exponential term which is typical for point process likelihoods and (2) Pólya-Gamma variables, which make the likelihood quadratic in the coupling parameters. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimate of network parameters. Using a third set of latent variables we extend the EM algorithm to sparse couplings via L1 regularization. Finally, we develop an efficient approximate Bayesian inference algorithm using a variational approach. We demonstrate the performance of our algorithms on data simulated from an Ising model. For data which are simulated from a more biologically plausible network with spiking neurons, we show that the Ising model captures well the low order statistics of the data and how the Ising couplings are related to the underlying synaptic structure of the simulated network.
NASA Astrophysics Data System (ADS)
Zhou, X.; Albertson, J. D.
2016-12-01
Natural gas is considered as a bridge fuel towards clean energy due to its potential lower greenhouse gas emission comparing with other fossil fuels. Despite numerous efforts, an efficient and cost-effective approach to monitor fugitive methane emissions along the natural gas production-supply chain has not been developed yet. Recently, mobile methane measurement has been introduced which applies a Bayesian approach to probabilistically infer methane emission rates and update estimates recursively when new measurements become available. However, the likelihood function, especially the error term which determines the shape of the estimate uncertainty, is not rigorously defined and evaluated with field data. To address this issue, we performed a series of near-source (< 30 m) controlled methane release experiments using a specialized vehicle mounted with fast response methane analyzers and a GPS unit. Methane concentrations were measured at two different heights along mobile traversals downwind of the sources, and concurrent wind and temperature data are recorded by nearby 3-D sonic anemometers. With known methane release rates, the measurements were used to determine the functional form and the parameterization of the likelihood function in the Bayesian inference scheme under different meteorological conditions.
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples
Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David
2009-01-01
Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
NASA Astrophysics Data System (ADS)
Pachhai, S.; Masters, G.; Laske, G.
2017-12-01
Earth's normal-mode spectra are crucial to studying the long wavelength structure of the Earth. Such observations have been used extensively to estimate "splitting coefficients" which, in turn, can be used to determine the three-dimensional velocity and density structure. Most past studies apply a non-linear iterative inversion to estimate the splitting coefficients which requires that the earthquake source is known. However, it is challenging to know the source details, particularly for big events as used in normal-mode analyses. Additionally, the final solution of the non-linear inversion can depend on the choice of damping parameter and starting model. To circumvent the need to know the source, a two-step linear inversion has been developed and successfully applied to many mantle and core sensitive modes. The first step takes combinations of the data from a single event to produce spectra known as "receiver strips". The autoregressive nature of the receiver strips can then be used to estimate the structure coefficients without the need to know the source. Based on this approach, we recently employed a neighborhood algorithm to measure the splitting coefficients for an isolated inner-core sensitive mode (13S2). This approach explores the parameter space efficiently without any need of regularization and finds the structure coefficients which best fit the observed strips. Here, we implement a Bayesian approach to data collected for earthquakes from early 2000 and more recent. This approach combines the data (through likelihood) and prior information to provide rigorous parameter values and their uncertainties for both isolated and coupled modes. The likelihood function is derived from the inferred errors of the receiver strips which allows us to retrieve proper uncertainties. Finally, we apply model selection criteria that balance the trade-offs between fit (likelihood) and model complexity to investigate the degree and type of structure (elastic and anelastic) required to explain the data.
Quantitative PET Imaging in Drug Development: Estimation of Target Occupancy.
Naganawa, Mika; Gallezot, Jean-Dominique; Rossano, Samantha; Carson, Richard E
2017-12-11
Positron emission tomography, an imaging tool using radiolabeled tracers in humans and preclinical species, has been widely used in recent years in drug development, particularly in the central nervous system. One important goal of PET in drug development is assessing the occupancy of various molecular targets (e.g., receptors, transporters, enzymes) by exogenous drugs. The current linear mathematical approaches used to determine occupancy using PET imaging experiments are presented. These algorithms use results from multiple regions with different target content in two scans, a baseline (pre-drug) scan and a post-drug scan. New mathematical estimation approaches to determine target occupancy, using maximum likelihood, are presented. A major challenge in these methods is the proper definition of the covariance matrix of the regional binding measures, accounting for different variance of the individual regional measures and their nonzero covariance, factors that have been ignored by conventional methods. The novel methods are compared to standard methods using simulation and real human occupancy data. The simulation data showed the expected reduction in variance and bias using the proper maximum likelihood methods, when the assumptions of the estimation method matched those in simulation. Between-method differences for data from human occupancy studies were less obvious, in part due to small dataset sizes. These maximum likelihood methods form the basis for development of improved PET covariance models, in order to minimize bias and variance in PET occupancy studies.
Aralis, Hilary; Brookmeyer, Ron
2017-01-01
Multistate models provide an important method for analyzing a wide range of life history processes including disease progression and patient recovery following medical intervention. Panel data consisting of the states occupied by an individual at a series of discrete time points are often used to estimate transition intensities of the underlying continuous-time process. When transition intensities depend on the time elapsed in the current state and back transitions between states are possible, this intermittent observation process presents difficulties in estimation due to intractability of the likelihood function. In this manuscript, we present an iterative stochastic expectation-maximization algorithm that relies on a simulation-based approximation to the likelihood function and implement this algorithm using rejection sampling. In a simulation study, we demonstrate the feasibility and performance of the proposed procedure. We then demonstrate application of the algorithm to a study of dementia, the Nun Study, consisting of intermittently-observed elderly subjects in one of four possible states corresponding to intact cognition, impaired cognition, dementia, and death. We show that the proposed stochastic expectation-maximization algorithm substantially reduces bias in model parameter estimates compared to an alternative approach used in the literature, minimal path estimation. We conclude that in estimating intermittently observed semi-Markov models, the proposed approach is a computationally feasible and accurate estimation procedure that leads to substantial improvements in back transition estimates.
Deterministic quantum annealing expectation-maximization algorithm
NASA Astrophysics Data System (ADS)
Miyahara, Hideyuki; Tsumura, Koji; Sughiyama, Yuki
2017-11-01
Maximum likelihood estimation (MLE) is one of the most important methods in machine learning, and the expectation-maximization (EM) algorithm is often used to obtain maximum likelihood estimates. However, EM heavily depends on initial configurations and fails to find the global optimum. On the other hand, in the field of physics, quantum annealing (QA) was proposed as a novel optimization approach. Motivated by QA, we propose a quantum annealing extension of EM, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. We also discuss its advantage in terms of the path integral formulation. Furthermore, by employing numerical simulations, we illustrate how DQAEM works in MLE and show that DQAEM moderate the problem of local optima in EM.
Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.
Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R
2018-05-26
Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Measurement of CIB power spectra with CAM-SPEC from Planck HFI maps
NASA Astrophysics Data System (ADS)
Mak, Suet Ying; Challinor, Anthony; Efstathiou, George; Lagache, Guilaine
2015-08-01
We present new measurements of the cosmic infrared background (CIB) anisotropies and its first likelihood using Planck HFI data at 353, 545, and 857 GHz. The measurements are based on cross-frequency power spectra and likelihood analysis using the CAM-SPEC package, rather than map based template removal of foregrounds as done in previous Planck CIB analysis. We construct the likelihood of the CIB temperature fluctuations, an extension of CAM-SPEC likelihood as used in CMB analysis to higher frequency, and use it to drive the best estimate of the CIB power spectrum over three decades in multiple moment, l, covering 50 ≤ l ≤ 2500. We adopt parametric models of the CIB and foreground contaminants (Galactic cirrus, infrared point sources, and cosmic microwave background anisotropies), and calibrate the dataset uniformly across frequencies with known Planck beam and noise properties in the likelihood construction. We validate our likelihood through simulations and extensive suite of consistency tests, and assess the impact of instrumental and data selection effects on the final CIB power spectrum constraints. Two approaches are developed for interpreting the CIB power spectrum. The first approach is based on simple parametric model which model the cross frequency power using amplitudes, correlation coefficients, and known multipole dependence. The second approach is based on the physical models for galaxy clustering and the evolution of infrared emission of galaxies. The new approaches fit all auto- and cross- power spectra very well, with the best fit of χ2ν = 1.04 (parametric model). Using the best foreground solution, we find that the cleaned CIB power spectra are in good agreement with previous Planck and Herschel measurements.
Quantifying the uncertainty in heritability.
Furlotte, Nicholas A; Heckerman, David; Lippert, Christoph
2014-05-01
The use of mixed models to determine narrow-sense heritability and related quantities such as SNP heritability has received much recent attention. Less attention has been paid to the inherent variability in these estimates. One approach for quantifying variability in estimates of heritability is a frequentist approach, in which heritability is estimated using maximum likelihood and its variance is quantified through an asymptotic normal approximation. An alternative approach is to quantify the uncertainty in heritability through its Bayesian posterior distribution. In this paper, we develop the latter approach, make it computationally efficient and compare it to the frequentist approach. We show theoretically that, for a sufficiently large sample size and intermediate values of heritability, the two approaches provide similar results. Using the Atherosclerosis Risk in Communities cohort, we show empirically that the two approaches can give different results and that the variance/uncertainty can remain large.
Alam, M S; Bognar, J G; Cain, S; Yasuda, B J
1998-03-10
During the process of microscanning a controlled vibrating mirror typically is used to produce subpixel shifts in a sequence of forward-looking infrared (FLIR) images. If the FLIR is mounted on a moving platform, such as an aircraft, uncontrolled random vibrations associated with the platform can be used to generate the shifts. Iterative techniques such as the expectation-maximization (EM) approach by means of the maximum-likelihood algorithm can be used to generate high-resolution images from multiple randomly shifted aliased frames. In the maximum-likelihood approach the data are considered to be Poisson random variables and an EM algorithm is developed that iteratively estimates an unaliased image that is compensated for known imager-system blur while it simultaneously estimates the translational shifts. Although this algorithm yields high-resolution images from a sequence of randomly shifted frames, it requires significant computation time and cannot be implemented for real-time applications that use the currently available high-performance processors. The new image shifts are iteratively calculated by evaluation of a cost function that compares the shifted and interlaced data frames with the corresponding values in the algorithm's latest estimate of the high-resolution image. We present a registration algorithm that estimates the shifts in one step. The shift parameters provided by the new algorithm are accurate enough to eliminate the need for iterative recalculation of translational shifts. Using this shift information, we apply a simplified version of the EM algorithm to estimate a high-resolution image from a given sequence of video frames. The proposed modified EM algorithm has been found to reduce significantly the computational burden when compared with the original EM algorithm, thus making it more attractive for practical implementation. Both simulation and experimental results are presented to verify the effectiveness of the proposed technique.
NASA Astrophysics Data System (ADS)
Bakoban, Rana A.
2017-08-01
The coefficient of variation [CV] has several applications in applied statistics. So in this paper, we adopt Bayesian and non-Bayesian approaches for the estimation of CV under type-II censored data from extension exponential distribution [EED]. The point and interval estimate of the CV are obtained for each of the maximum likelihood and parametric bootstrap techniques. Also the Bayesian approach with the help of MCMC method is presented. A real data set is presented and analyzed, hence the obtained results are used to assess the obtained theoretical results.
NASA Astrophysics Data System (ADS)
Nourali, Mahrouz; Ghahraman, Bijan; Pourreza-Bilondi, Mohsen; Davary, Kamran
2016-09-01
In the present study, DREAM(ZS), Differential Evolution Adaptive Metropolis combined with both formal and informal likelihood functions, is used to investigate uncertainty of parameters of the HEC-HMS model in Tamar watershed, Golestan province, Iran. In order to assess the uncertainty of 24 parameters used in HMS, three flood events were used to calibrate and one flood event was used to validate the posterior distributions. Moreover, performance of seven different likelihood functions (L1-L7) was assessed by means of DREAM(ZS)approach. Four likelihood functions, L1-L4, Nash-Sutcliffe (NS) efficiency, Normalized absolute error (NAE), Index of agreement (IOA), and Chiew-McMahon efficiency (CM), is considered as informal, whereas remaining (L5-L7) is represented in formal category. L5 focuses on the relationship between the traditional least squares fitting and the Bayesian inference, and L6, is a hetereoscedastic maximum likelihood error (HMLE) estimator. Finally, in likelihood function L7, serial dependence of residual errors is accounted using a first-order autoregressive (AR) model of the residuals. According to the results, sensitivities of the parameters strongly depend on the likelihood function, and vary for different likelihood functions. Most of the parameters were better defined by formal likelihood functions L5 and L7 and showed a high sensitivity to model performance. Posterior cumulative distributions corresponding to the informal likelihood functions L1, L2, L3, L4 and the formal likelihood function L6 are approximately the same for most of the sub-basins, and these likelihood functions depict almost a similar effect on sensitivity of parameters. 95% total prediction uncertainty bounds bracketed most of the observed data. Considering all the statistical indicators and criteria of uncertainty assessment, including RMSE, KGE, NS, P-factor and R-factor, results showed that DREAM(ZS) algorithm performed better under formal likelihood functions L5 and L7, but likelihood function L5 may result in biased and unreliable estimation of parameters due to violation of the residualerror assumptions. Thus, likelihood function L7 provides posterior distribution of model parameters credibly and therefore can be employed for further applications.
Chan, Aaron C.; Srinivasan, Vivek J.
2013-01-01
In optical coherence tomography (OCT) and ultrasound, unbiased Doppler frequency estimators with low variance are desirable for blood velocity estimation. Hardware improvements in OCT mean that ever higher acquisition rates are possible, which should also, in principle, improve estimation performance. Paradoxically, however, the widely used Kasai autocorrelation estimator’s performance worsens with increasing acquisition rate. We propose that parametric estimators based on accurate models of noise statistics can offer better performance. We derive a maximum likelihood estimator (MLE) based on a simple additive white Gaussian noise model, and show that it can outperform the Kasai autocorrelation estimator. In addition, we also derive the Cramer Rao lower bound (CRLB), and show that the variance of the MLE approaches the CRLB for moderate data lengths and noise levels. We note that the MLE performance improves with longer acquisition time, and remains constant or improves with higher acquisition rates. These qualities may make it a preferred technique as OCT imaging speed continues to improve. Finally, our work motivates the development of more general parametric estimators based on statistical models of decorrelation noise. PMID:23446044
Kanyangarara, Mufaro; Munos, Melinda K; Walker, Neff
2017-01-01
Background Utilization of antenatal care (ANC) services has increased over the past two decades. Continued gains in maternal and newborn health will require an understanding of both access and quality of ANC services. We linked health facility and household survey data to examine the quality of service provision for five ANC interventions across health facilities in sub–Saharan Africa. Methods Using data from 20 nationally representative health facility assessments – the Service Provision Assessment (SPA) and the Service Availability and Readiness Assessment (SARA), we estimated facility level readiness to deliver five ANC interventions: tetanus toxoid vaccine for pregnant women, intermittent preventive treatment for malaria in pregnancy (IPTp), syphilis detection and treatment in pregnancy, iron supplementation and hypertensive disease case management. Facility level indicators were stratified by health facility type, managing authority and location, then linked to estimates of ANC utilization in that stratum from the corresponding Demographic and Health Surveys (DHS) to generate population level estimates of the ‘likelihood of appropriate care’. Finally, the association between estimates of the ‘likelihood of appropriate care’ from the linking approach and estimates of coverage levels from the DHS were assessed. Findings A total of 10 534 health facilities were surveyed in the 20 health facility assessments, of which 8742 reported offering ANC services and were included in the analysis. Health facility readiness to deliver IPTp, iron supplementation, and tetanus toxoid vaccination was higher (median: 84.1%, 84.9% and 82.8% respectively) than readiness to deliver hypertensive disease case management and syphilis detection and treatment (median: 23.0% and 19.9% respectively). Coverage of at least 4 ANC visits ranged from 24.8% to 75.8%. Estimates of the likelihood of appropriate care derived from linking health facility and household survey data showed marked gaps for all interventions, particularly hypertensive disease case management and syphilis detection and treatment. There was fairly good concordance between our estimates of high likelihood of appropriate care and DHS estimates of coverage for iron supplementation, IPTp, and tetanus toxoid vaccination. Conclusion Linking household surveys to health facility assessments revealed marked gaps in population–level coverage of quality ANC interventions and underscored the need for a double–pronged approach to increase ANC utilization and improve the quality of ANC services. PMID:29163936
Kanyangarara, Mufaro; Munos, Melinda K; Walker, Neff
2017-12-01
Utilization of antenatal care (ANC) services has increased over the past two decades. Continued gains in maternal and newborn health will require an understanding of both access and quality of ANC services. We linked health facility and household survey data to examine the quality of service provision for five ANC interventions across health facilities in sub-Saharan Africa. Using data from 20 nationally representative health facility assessments - the Service Provision Assessment (SPA) and the Service Availability and Readiness Assessment (SARA), we estimated facility level readiness to deliver five ANC interventions: tetanus toxoid vaccine for pregnant women, intermittent preventive treatment for malaria in pregnancy (IPTp), syphilis detection and treatment in pregnancy, iron supplementation and hypertensive disease case management. Facility level indicators were stratified by health facility type, managing authority and location, then linked to estimates of ANC utilization in that stratum from the corresponding Demographic and Health Surveys (DHS) to generate population level estimates of the 'likelihood of appropriate care'. Finally, the association between estimates of the 'likelihood of appropriate care' from the linking approach and estimates of coverage levels from the DHS were assessed. A total of 10 534 health facilities were surveyed in the 20 health facility assessments, of which 8742 reported offering ANC services and were included in the analysis. Health facility readiness to deliver IPTp, iron supplementation, and tetanus toxoid vaccination was higher (median: 84.1%, 84.9% and 82.8% respectively) than readiness to deliver hypertensive disease case management and syphilis detection and treatment (median: 23.0% and 19.9% respectively). Coverage of at least 4 ANC visits ranged from 24.8% to 75.8%. Estimates of the likelihood of appropriate care derived from linking health facility and household survey data showed marked gaps for all interventions, particularly hypertensive disease case management and syphilis detection and treatment. There was fairly good concordance between our estimates of high likelihood of appropriate care and DHS estimates of coverage for iron supplementation, IPTp, and tetanus toxoid vaccination. Linking household surveys to health facility assessments revealed marked gaps in population-level coverage of quality ANC interventions and underscored the need for a double-pronged approach to increase ANC utilization and improve the quality of ANC services.
Ye, Xin; Garikapati, Venu M.; You, Daehyun; ...
2017-11-08
Most multinomial choice models (e.g., the multinomial logit model) adopted in practice assume an extreme-value Gumbel distribution for the random components (error terms) of utility functions. This distributional assumption offers a closed-form likelihood expression when the utility maximization principle is applied to model choice behaviors. As a result, model coefficients can be easily estimated using the standard maximum likelihood estimation method. However, maximum likelihood estimators are consistent and efficient only if distributional assumptions on the random error terms are valid. It is therefore critical to test the validity of underlying distributional assumptions on the error terms that form the basismore » of parameter estimation and policy evaluation. In this paper, a practical yet statistically rigorous method is proposed to test the validity of the distributional assumption on the random components of utility functions in both the multinomial logit (MNL) model and multiple discrete-continuous extreme value (MDCEV) model. Based on a semi-nonparametric approach, a closed-form likelihood function that nests the MNL or MDCEV model being tested is derived. The proposed method allows traditional likelihood ratio tests to be used to test violations of the standard Gumbel distribution assumption. Simulation experiments are conducted to demonstrate that the proposed test yields acceptable Type-I and Type-II error probabilities at commonly available sample sizes. The test is then applied to three real-world discrete and discrete-continuous choice models. For all three models, the proposed test rejects the validity of the standard Gumbel distribution in most utility functions, calling for the development of robust choice models that overcome adverse effects of violations of distributional assumptions on the error terms in random utility functions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ye, Xin; Garikapati, Venu M.; You, Daehyun
Most multinomial choice models (e.g., the multinomial logit model) adopted in practice assume an extreme-value Gumbel distribution for the random components (error terms) of utility functions. This distributional assumption offers a closed-form likelihood expression when the utility maximization principle is applied to model choice behaviors. As a result, model coefficients can be easily estimated using the standard maximum likelihood estimation method. However, maximum likelihood estimators are consistent and efficient only if distributional assumptions on the random error terms are valid. It is therefore critical to test the validity of underlying distributional assumptions on the error terms that form the basismore » of parameter estimation and policy evaluation. In this paper, a practical yet statistically rigorous method is proposed to test the validity of the distributional assumption on the random components of utility functions in both the multinomial logit (MNL) model and multiple discrete-continuous extreme value (MDCEV) model. Based on a semi-nonparametric approach, a closed-form likelihood function that nests the MNL or MDCEV model being tested is derived. The proposed method allows traditional likelihood ratio tests to be used to test violations of the standard Gumbel distribution assumption. Simulation experiments are conducted to demonstrate that the proposed test yields acceptable Type-I and Type-II error probabilities at commonly available sample sizes. The test is then applied to three real-world discrete and discrete-continuous choice models. For all three models, the proposed test rejects the validity of the standard Gumbel distribution in most utility functions, calling for the development of robust choice models that overcome adverse effects of violations of distributional assumptions on the error terms in random utility functions.« less
Fast Component Pursuit for Large-Scale Inverse Covariance Estimation.
Han, Lei; Zhang, Yu; Zhang, Tong
2016-08-01
The maximum likelihood estimation (MLE) for the Gaussian graphical model, which is also known as the inverse covariance estimation problem, has gained increasing interest recently. Most existing works assume that inverse covariance estimators contain sparse structure and then construct models with the ℓ 1 regularization. In this paper, different from existing works, we study the inverse covariance estimation problem from another perspective by efficiently modeling the low-rank structure in the inverse covariance, which is assumed to be a combination of a low-rank part and a diagonal matrix. One motivation for this assumption is that the low-rank structure is common in many applications including the climate and financial analysis, and another one is that such assumption can reduce the computational complexity when computing its inverse. Specifically, we propose an efficient COmponent Pursuit (COP) method to obtain the low-rank part, where each component can be sparse. For optimization, the COP method greedily learns a rank-one component in each iteration by maximizing the log-likelihood. Moreover, the COP algorithm enjoys several appealing properties including the existence of an efficient solution in each iteration and the theoretical guarantee on the convergence of this greedy approach. Experiments on large-scale synthetic and real-world datasets including thousands of millions variables show that the COP method is faster than the state-of-the-art techniques for the inverse covariance estimation problem when achieving comparable log-likelihood on test data.
GRID-BASED EXPLORATION OF COSMOLOGICAL PARAMETER SPACE WITH SNAKE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mikkelsen, K.; Næss, S. K.; Eriksen, H. K., E-mail: kristin.mikkelsen@astro.uio.no
2013-11-10
We present a fully parallelized grid-based parameter estimation algorithm for investigating multidimensional likelihoods called Snake, and apply it to cosmological parameter estimation. The basic idea is to map out the likelihood grid-cell by grid-cell according to decreasing likelihood, and stop when a certain threshold has been reached. This approach improves vastly on the 'curse of dimensionality' problem plaguing standard grid-based parameter estimation simply by disregarding grid cells with negligible likelihood. The main advantages of this method compared to standard Metropolis-Hastings Markov Chain Monte Carlo methods include (1) trivial extraction of arbitrary conditional distributions; (2) direct access to Bayesian evidences; (3)more » better sampling of the tails of the distribution; and (4) nearly perfect parallelization scaling. The main disadvantage is, as in the case of brute-force grid-based evaluation, a dependency on the number of parameters, N{sub par}. One of the main goals of the present paper is to determine how large N{sub par} can be, while still maintaining reasonable computational efficiency; we find that N{sub par} = 12 is well within the capabilities of the method. The performance of the code is tested by comparing cosmological parameters estimated using Snake and the WMAP-7 data with those obtained using CosmoMC, the current standard code in the field. We find fully consistent results, with similar computational expenses, but shorter wall time due to the perfect parallelization scheme.« less
Galka, Andreas; Siniatchkin, Michael; Stephani, Ulrich; Groening, Kristina; Wolff, Stephan; Bosch-Bayard, Jorge; Ozaki, Tohru
2010-12-01
The analysis of time series obtained by functional magnetic resonance imaging (fMRI) may be approached by fitting predictive parametric models, such as nearest-neighbor autoregressive models with exogeneous input (NNARX). As a part of the modeling procedure, it is possible to apply instantaneous linear transformations to the data. Spatial smoothing, a common preprocessing step, may be interpreted as such a transformation. The autoregressive parameters may be constrained, such that they provide a response behavior that corresponds to the canonical haemodynamic response function (HRF). We present an algorithm for estimating the parameters of the linear transformations and of the HRF within a rigorous maximum-likelihood framework. Using this approach, an optimal amount of both the spatial smoothing and the HRF can be estimated simultaneously for a given fMRI data set. An example from a motor-task experiment is discussed. It is found that, for this data set, weak, but non-zero, spatial smoothing is optimal. Furthermore, it is demonstrated that activated regions can be estimated within the maximum-likelihood framework.
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.
2013-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Empirical Bayes Approaches to Multivariate Fuzzy Partitions.
ERIC Educational Resources Information Center
Woodbury, Max A.; Manton, Kenneth G.
1991-01-01
An empirical Bayes-maximum likelihood estimation procedure is presented for the application of fuzzy partition models in describing high dimensional discrete response data. The model describes individuals in terms of partial membership in multiple latent categories that represent bounded discrete spaces. (SLD)
NASA Astrophysics Data System (ADS)
Pickard, William F.
2004-10-01
The classical PERT inverse statistics problem requires estimation of the mean, \\skew1\\bar{m} , and standard deviation, s, of a unimodal distribution given estimates of its mode, m, and of the smallest, a, and largest, b, values likely to be encountered. After placing the problem in historical perspective and showing that it is ill-posed because it is underdetermined, this paper offers an approach to resolve the ill-posedness: (a) by interpreting a and b modes of order statistic distributions; (b) by requiring also an estimate of the number of samples, N, considered in estimating the set {m, a, b}; and (c) by maximizing a suitable likelihood, having made the traditional assumption that the underlying distribution is beta. Exact formulae relating the four parameters of the beta distribution to {m, a, b, N} and the assumed likelihood function are then used to compute the four underlying parameters of the beta distribution; and from them, \\skew1\\bar{m} and s are computed using exact formulae.
Maximum likelihood techniques applied to quasi-elastic light scattering
NASA Technical Reports Server (NTRS)
Edwards, Robert V.
1992-01-01
There is a necessity of having an automatic procedure for reliable estimation of the quality of the measurement of particle size from QELS (Quasi-Elastic Light Scattering). Getting the measurement itself, before any error estimates can be made, is a problem because it is obtained by a very indirect measurement of a signal derived from the motion of particles in the system and requires the solution of an inverse problem. The eigenvalue structure of the transform that generates the signal is such that an arbitrarily small amount of noise can obliterate parts of any practical inversion spectrum. This project uses the Maximum Likelihood Estimation (MLE) as a framework to generate a theory and a functioning set of software to oversee the measurement process and extract the particle size information, while at the same time providing error estimates for those measurements. The theory involved verifying a correct form of the covariance matrix for the noise on the measurement and then estimating particle size parameters using a modified histogram approach.
NASA Technical Reports Server (NTRS)
Walker, H. F.
1976-01-01
Likelihood equations determined by the two types of samples which are necessary conditions for a maximum-likelihood estimate were considered. These equations suggest certain successive approximations iterative procedures for obtaining maximum likelihood estimates. The procedures, which are generalized steepest ascent (deflected gradient) procedures, contain those of Hosmer as a special case.
Estimating hazard ratios in cohort data with missing disease information due to death.
Binder, Nadine; Herrnböck, Anne-Sophie; Schumacher, Martin
2017-03-01
In clinical and epidemiological studies information on the primary outcome of interest, that is, the disease status, is usually collected at a limited number of follow-up visits. The disease status can often only be retrieved retrospectively in individuals who are alive at follow-up, but will be missing for those who died before. Right-censoring the death cases at the last visit (ad-hoc analysis) yields biased hazard ratio estimates of a potential risk factor, and the bias can be substantial and occur in either direction. In this work, we investigate three different approaches that use the same likelihood contributions derived from an illness-death multistate model in order to more adequately estimate the hazard ratio by including the death cases into the analysis: a parametric approach, a penalized likelihood approach, and an imputation-based approach. We investigate to which extent these approaches allow for an unbiased regression analysis by evaluating their performance in simulation studies and on a real data example. In doing so, we use the full cohort with complete illness-death data as reference and artificially induce missing information due to death by setting discrete follow-up visits. Compared to an ad-hoc analysis, all considered approaches provide less biased or even unbiased results, depending on the situation studied. In the real data example, the parametric approach is seen to be too restrictive, whereas the imputation-based approach could almost reconstruct the original event history information. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Statistical field estimators for multiscale simulations.
Eapen, Jacob; Li, Ju; Yip, Sidney
2005-11-01
We present a systematic approach for generating smooth and accurate fields from particle simulation data using the notions of statistical inference. As an extension to a parametric representation based on the maximum likelihood technique previously developed for velocity and temperature fields, a nonparametric estimator based on the principle of maximum entropy is proposed for particle density and stress fields. Both estimators are applied to represent molecular dynamics data on shear-driven flow in an enclosure which exhibits a high degree of nonlinear characteristics. We show that the present density estimator is a significant improvement over ad hoc bin averaging and is also free of systematic boundary artifacts that appear in the method of smoothing kernel estimates. Similarly, the velocity fields generated by the maximum likelihood estimator do not show any edge effects that can be erroneously interpreted as slip at the wall. For low Reynolds numbers, the velocity fields and streamlines generated by the present estimator are benchmarked against Newtonian continuum calculations. For shear velocities that are a significant fraction of the thermal speed, we observe a form of shear localization that is induced by the confining boundary.
NASA Technical Reports Server (NTRS)
Rodriguez, G.; Scheid, R. E., Jr.
1986-01-01
This paper outlines methods for modeling, identification and estimation for static determination of flexible structures. The shape estimation schemes are based on structural models specified by (possibly interconnected) elliptic partial differential equations. The identification techniques provide approximate knowledge of parameters in elliptic systems. The techniques are based on the method of maximum-likelihood that finds parameter values such that the likelihood functional associated with the system model is maximized. The estimation methods are obtained by means of a function-space approach that seeks to obtain the conditional mean of the state given the data and a white noise characterization of model errors. The solutions are obtained in a batch-processing mode in which all the data is processed simultaneously. After methods for computing the optimal estimates are developed, an analysis of the second-order statistics of the estimates and of the related estimation error is conducted. In addition to outlining the above theoretical results, the paper presents typical flexible structure simulations illustrating performance of the shape determination methods.
Guindon, Stéphane; Dufayard, Jean-François; Lefort, Vincent; Anisimova, Maria; Hordijk, Wim; Gascuel, Olivier
2010-05-01
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
Distributed multimodal data fusion for large scale wireless sensor networks
NASA Astrophysics Data System (ADS)
Ertin, Emre
2006-05-01
Sensor network technology has enabled new surveillance systems where sensor nodes equipped with processing and communication capabilities can collaboratively detect, classify and track targets of interest over a large surveillance area. In this paper we study distributed fusion of multimodal sensor data for extracting target information from a large scale sensor network. Optimal tracking, classification, and reporting of threat events require joint consideration of multiple sensor modalities. Multiple sensor modalities improve tracking by reducing the uncertainty in the track estimates as well as resolving track-sensor data association problems. Our approach to solving the fusion problem with large number of multimodal sensors is construction of likelihood maps. The likelihood maps provide a summary data for the solution of the detection, tracking and classification problem. The likelihood map presents the sensory information in an easy format for the decision makers to interpret and is suitable with fusion of spatial prior information such as maps, imaging data from stand-off imaging sensors. We follow a statistical approach to combine sensor data at different levels of uncertainty and resolution. The likelihood map transforms each sensor data stream to a spatio-temporal likelihood map ideally suitable for fusion with imaging sensor outputs and prior geographic information about the scene. We also discuss distributed computation of the likelihood map using a gossip based algorithm and present simulation results.
The Inverse Problem for Confined Aquifer Flow: Identification and Estimation With Extensions
NASA Astrophysics Data System (ADS)
Loaiciga, Hugo A.; MariñO, Miguel A.
1987-01-01
The contributions of this work are twofold. First, a methodology for estimating the elements of parameter matrices in the governing equation of flow in a confined aquifer is developed. The estimation techniques for the distributed-parameter inverse problem pertain to linear least squares and generalized least squares methods. The linear relationship among the known heads and unknown parameters of the flow equation provides the background for developing criteria for determining the identifiability status of unknown parameters. Under conditions of exact or overidentification it is possible to develop statistically consistent parameter estimators and their asymptotic distributions. The estimation techniques, namely, two-stage least squares and three stage least squares, are applied to a specific groundwater inverse problem and compared between themselves and with an ordinary least squares estimator. The three-stage estimator provides the closer approximation to the actual parameter values, but it also shows relatively large standard errors as compared to the ordinary and two-stage estimators. The estimation techniques provide the parameter matrices required to simulate the unsteady groundwater flow equation. Second, a nonlinear maximum likelihood estimation approach to the inverse problem is presented. The statistical properties of maximum likelihood estimators are derived, and a procedure to construct confidence intervals and do hypothesis testing is given. The relative merits of the linear and maximum likelihood estimators are analyzed. Other topics relevant to the identification and estimation methodologies, i.e., a continuous-time solution to the flow equation, coping with noise-corrupted head measurements, and extension of the developed theory to nonlinear cases are also discussed. A simulation study is used to evaluate the methods developed in this study.
Lapidus, Nathanael; Chevret, Sylvie; Resche-Rigon, Matthieu
2014-12-30
Agreement between two assays is usually based on the concordance correlation coefficient (CCC), estimated from the means, standard deviations, and correlation coefficient of these assays. However, such data will often suffer from left-censoring because of lower limits of detection of these assays. To handle such data, we propose to extend a multiple imputation approach by chained equations (MICE) developed in a close setting of one left-censored assay. The performance of this two-step approach is compared with that of a previously published maximum likelihood estimation through a simulation study. Results show close estimates of the CCC by both methods, although the coverage is improved by our MICE proposal. An application to cytomegalovirus quantification data is provided. Copyright © 2014 John Wiley & Sons, Ltd.
Quantifying the uncertainty in heritability
Furlotte, Nicholas A; Heckerman, David; Lippert, Christoph
2014-01-01
The use of mixed models to determine narrow-sense heritability and related quantities such as SNP heritability has received much recent attention. Less attention has been paid to the inherent variability in these estimates. One approach for quantifying variability in estimates of heritability is a frequentist approach, in which heritability is estimated using maximum likelihood and its variance is quantified through an asymptotic normal approximation. An alternative approach is to quantify the uncertainty in heritability through its Bayesian posterior distribution. In this paper, we develop the latter approach, make it computationally efficient and compare it to the frequentist approach. We show theoretically that, for a sufficiently large sample size and intermediate values of heritability, the two approaches provide similar results. Using the Atherosclerosis Risk in Communities cohort, we show empirically that the two approaches can give different results and that the variance/uncertainty can remain large. PMID:24670270
A parametric method for determining the number of signals in narrow-band direction finding
NASA Astrophysics Data System (ADS)
Wu, Qiang; Fuhrmann, Daniel R.
1991-08-01
A novel and more accurate method to determine the number of signals in the multisource direction finding problem is developed. The information-theoretic criteria of Yin and Krishnaiah (1988) are applied to a set of quantities which are evaluated from the log-likelihood function. Based on proven asymptotic properties of the maximum likelihood estimation, these quantities have the properties required by the criteria. Since the information-theoretic criteria use these quantities instead of the eigenvalues of the estimated correlation matrix, this approach possesses the advantage of not requiring a subjective threshold, and also provides higher performance than when eigenvalues are used. Simulation results are presented and compared to those obtained from the nonparametric method given by Wax and Kailath (1985).
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
Xu, Bo; Yang, Ziheng
2016-01-01
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902
Decker, Anna L.; Hubbard, Alan; Crespi, Catherine M.; Seto, Edmund Y.W.; Wang, May C.
2015-01-01
While child and adolescent obesity is a serious public health concern, few studies have utilized parameters based on the causal inference literature to examine the potential impacts of early intervention. The purpose of this analysis was to estimate the causal effects of early interventions to improve physical activity and diet during adolescence on body mass index (BMI), a measure of adiposity, using improved techniques. The most widespread statistical method in studies of child and adolescent obesity is multi-variable regression, with the parameter of interest being the coefficient on the variable of interest. This approach does not appropriately adjust for time-dependent confounding, and the modeling assumptions may not always be met. An alternative parameter to estimate is one motivated by the causal inference literature, which can be interpreted as the mean change in the outcome under interventions to set the exposure of interest. The underlying data-generating distribution, upon which the estimator is based, can be estimated via a parametric or semi-parametric approach. Using data from the National Heart, Lung, and Blood Institute Growth and Health Study, a 10-year prospective cohort study of adolescent girls, we estimated the longitudinal impact of physical activity and diet interventions on 10-year BMI z-scores via a parameter motivated by the causal inference literature, using both parametric and semi-parametric estimation approaches. The parameters of interest were estimated with a recently released R package, ltmle, for estimating means based upon general longitudinal treatment regimes. We found that early, sustained intervention on total calories had a greater impact than a physical activity intervention or non-sustained interventions. Multivariable linear regression yielded inflated effect estimates compared to estimates based on targeted maximum-likelihood estimation and data-adaptive super learning. Our analysis demonstrates that sophisticated, optimal semiparametric estimation of longitudinal treatment-specific means via ltmle provides an incredibly powerful, yet easy-to-use tool, removing impediments for putting theory into practice. PMID:26046009
Exploiting Non-sequence Data in Dynamic Model Learning
2013-10-01
For our experiments here and in Section 3.5, we implement the proposed algorithms in MATLAB and use the maximum directed spanning tree solver...embarrassingly parallelizable, whereas PM’s maximum directed spanning tree procedure is harder to parallelize. In this experiment, our MATLAB ...some estimation problems, this approach is able to give unique and consistent estimates while the maximum- likelihood method gets entangled in
The Educational Consequences of Teen Childbearing
Kane, Jennifer B.; Morgan, S. Philip; Harris, Kathleen Mullan; Guilkey, David K.
2013-01-01
A huge literature shows that teen mothers face a variety of detriments across the life course, including truncated educational attainment. To what extent is this association causal? The estimated effects of teen motherhood on schooling vary widely, ranging from no discernible difference to 2.6 fewer years among teen mothers. The magnitude of educational consequences is therefore uncertain, despite voluminous policy and prevention efforts that rest on the assumption of a negative and presumably causal effect. This study adjudicates between two potential sources of inconsistency in the literature—methodological differences or cohort differences—by using a single, high-quality data source: namely, The National Longitudinal Study of Adolescent Health. We replicate analyses across four different statistical strategies: ordinary least squares regression; propensity score matching; and parametric and semiparametric maximum likelihood estimation. Results demonstrate educational consequences of teen childbearing, with estimated effects between 0.7 and 1.9 fewer years of schooling among teen mothers. We select our preferred estimate (0.7), derived from semiparametric maximum likelihood estimation, on the basis of weighing the strengths and limitations of each approach. Based on the range of estimated effects observed in our study, we speculate that variable statistical methods are the likely source of inconsistency in the past. We conclude by discussing implications for future research and policy, and recommend that future studies employ a similar multimethod approach to evaluate findings. PMID:24078155
Balzer, Laura B; Zheng, Wenjing; van der Laan, Mark J; Petersen, Maya L
2018-01-01
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Erin A.; Robinson, Sean M.; Anderson, Kevin K.
2015-01-19
Here we present a novel technique for the localization of radiological sources in urban or rural environments from an aerial platform. The technique is based on a Bayesian approach to localization, in which measured count rates in a time series are compared with predicted count rates from a series of pre-calculated test sources to define likelihood. Furthermore, this technique is expanded by using a localized treatment with a limited field of view (FOV), coupled with a likelihood ratio reevaluation, allowing for real-time computation on commodity hardware for arbitrarily complex detector models and terrain. In particular, detectors with inherent asymmetry ofmore » response (such as those employing internal collimation or self-shielding for enhanced directional awareness) are leveraged by this approach to provide improved localization. Our results from the localization technique are shown for simulated flight data using monolithic as well as directionally-aware detector models, and the capability of the methodology to locate radioisotopes is estimated for several test cases. This localization technique is shown to facilitate urban search by allowing quick and adaptive estimates of source location, in many cases from a single flyover near a source. In particular, this method represents a significant advancement from earlier methods like full-field Bayesian likelihood, which is not generally fast enough to allow for broad-field search in real time, and highest-net-counts estimation, which has a localization error that depends strongly on flight path and cannot generally operate without exhaustive search« less
Maximum-likelihood estimation of recent shared ancestry (ERSA).
Huff, Chad D; Witherspoon, David J; Simonson, Tatum S; Xing, Jinchuan; Watkins, W Scott; Zhang, Yuhua; Tuohy, Therese M; Neklason, Deborah W; Burt, Randall W; Guthery, Stephen L; Woodward, Scott R; Jorde, Lynn B
2011-05-01
Accurate estimation of recent shared ancestry is important for genetics, evolution, medicine, conservation biology, and forensics. Established methods estimate kinship accurately for first-degree through third-degree relatives. We demonstrate that chromosomal segments shared by two individuals due to identity by descent (IBD) provide much additional information about shared ancestry. We developed a maximum-likelihood method for the estimation of recent shared ancestry (ERSA) from the number and lengths of IBD segments derived from high-density SNP or whole-genome sequence data. We used ERSA to estimate relationships from SNP genotypes in 169 individuals from three large, well-defined human pedigrees. ERSA is accurate to within one degree of relationship for 97% of first-degree through fifth-degree relatives and 80% of sixth-degree and seventh-degree relatives. We demonstrate that ERSA's statistical power approaches the maximum theoretical limit imposed by the fact that distant relatives frequently share no DNA through a common ancestor. ERSA greatly expands the range of relationships that can be estimated from genetic data and is implemented in a freely available software package.
A Comparative Study of Co-Channel Interference Suppression Techniques
NASA Technical Reports Server (NTRS)
Hamkins, Jon; Satorius, Ed; Paparisto, Gent; Polydoros, Andreas
1997-01-01
We describe three methods of combatting co-channel interference (CCI): a cross-coupled phase-locked loop (CCPLL); a phase-tracking circuit (PTC), and joint Viterbi estimation based on the maximum likelihood principle. In the case of co-channel FM-modulated voice signals, the CCPLL and PTC methods typically outperform the maximum likelihood estimators when the modulation parameters are dissimilar. However, as the modulation parameters become identical, joint Viterbi estimation provides for a more robust estimate of the co-channel signals and does not suffer as much from "signal switching" which especially plagues the CCPLL approach. Good performance for the PTC requires both dissimilar modulation parameters and a priori knowledge of the co-channel signal amplitudes. The CCPLL and joint Viterbi estimators, on the other hand, incorporate accurate amplitude estimates. In addition, application of the joint Viterbi algorithm to demodulating co-channel digital (BPSK) signals in a multipath environment is also discussed. It is shown in this case that if the interference is sufficiently small, a single trellis model is most effective in demodulating the co-channel signals.
Wang, Shijun; Liu, Peter; Turkbey, Baris; Choyke, Peter; Pinto, Peter; Summers, Ronald M
2012-01-01
In this paper, we propose a new pharmacokinetic model for parameter estimation of dynamic contrast-enhanced (DCE) MRI by using Gaussian process inference. Our model is based on the Tofts dual-compartment model for the description of tracer kinetics and the observed time series from DCE-MRI is treated as a Gaussian stochastic process. The parameter estimation is done through a maximum likelihood approach and we propose a variant of the coordinate descent method to solve this likelihood maximization problem. The new model was shown to outperform a baseline method on simulated data. Parametric maps generated on prostate DCE data with the new model also provided better enhancement of tumors, lower intensity on false positives, and better boundary delineation when compared with the baseline method. New statistical parameter maps from the process model were also found to be informative, particularly when paired with the PK parameter maps.
Hühn, M
1995-05-01
Some approaches to molecular marker-assisted linkage detection for a dominant disease-resistance trait based on a segregating F2 population are discussed. Analysis of two-point linkage is carried out by the traditional measure of maximum lod score. It depends on (1) the maximum-likelihood estimate of the recombination fraction between the marker and the disease-resistance gene locus, (2) the observed absolute frequencies, and (3) the unknown number of tested individuals. If one replaces the absolute frequencies by expressions depending on the unknown sample size and the maximum-likelihood estimate of recombination value, the conventional rule for significant linkage (maximum lod score exceeds a given linkage threshold) can be resolved for the sample size. For each sub-population used for linkage analysis [susceptible (= recessive) individuals, resistant (= dominant) individuals, complete F2] this approach gives a lower bound for the necessary number of individuals required for the detection of significant two-point linkage by the lod-score method.
Polar bears in the Beaufort Sea: A 30-year mark-recapture case history
Amstrup, Steven C.; McDonald, T.L.; Stirling, I.
2001-01-01
Knowledge of population size and trend is necessary to manage anthropogenic risks to polar bears (Ursus maritimus). Despite capturing over 1,025 females between 1967 and 1998, previously calculated estimates of the size of the southern Beaufort Sea (SBS) population have been unreliable. We improved estimates of numbers of polar bears by modeling heterogeneity in capture probability with covariates. Important covariates referred to the year of the study, age of the bear, capture effort, and geographic location. Our choice of best approximating model was based on the inverse relationship between variance in parameter estimates and likelihood of the fit and suggested a growth from ≈ 500 to over 1,000 females during this study. The mean coefficient of variation on estimates for the last decade of the study was 0.16—the smallest yet derived. A similar model selection approach is recommended for other projects where a best model is not identified by likelihood criteria alone.
Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures
Moore, Brian R.; Höhna, Sebastian; May, Michael R.; Rannala, Bruce; Huelsenbeck, John P.
2016-01-01
Bayesian analysis of macroevolutionary mixtures (BAMM) has recently taken the study of lineage diversification by storm. BAMM estimates the diversification-rate parameters (speciation and extinction) for every branch of a study phylogeny and infers the number and location of diversification-rate shifts across branches of a tree. Our evaluation of BAMM reveals two major theoretical errors: (i) the likelihood function (which estimates the model parameters from the data) is incorrect, and (ii) the compound Poisson process prior model (which describes the prior distribution of diversification-rate shifts across branches) is incoherent. Using simulation, we demonstrate that these theoretical issues cause statistical pathologies; posterior estimates of the number of diversification-rate shifts are strongly influenced by the assumed prior, and estimates of diversification-rate parameters are unreliable. Moreover, the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM. PMID:27512038
Campbell, D A; Chkrebtii, O
2013-12-01
Statistical inference for biochemical models often faces a variety of characteristic challenges. In this paper we examine state and parameter estimation for the JAK-STAT intracellular signalling mechanism, which exemplifies the implementation intricacies common in many biochemical inference problems. We introduce an extension to the Generalized Smoothing approach for estimating delay differential equation models, addressing selection of complexity parameters, choice of the basis system, and appropriate optimization strategies. Motivated by the JAK-STAT system, we further extend the generalized smoothing approach to consider a nonlinear observation process with additional unknown parameters, and highlight how the approach handles unobserved states and unevenly spaced observations. The methodology developed is generally applicable to problems of estimation for differential equation models with delays, unobserved states, nonlinear observation processes, and partially observed histories. Crown Copyright © 2013. Published by Elsevier Inc. All rights reserved.
Ding, Jieli; Zhou, Haibo; Liu, Yanyan; Cai, Jianwen; Longnecker, Matthew P.
2014-01-01
Motivated by the need from our on-going environmental study in the Norwegian Mother and Child Cohort (MoBa) study, we consider an outcome-dependent sampling (ODS) scheme for failure-time data with censoring. Like the case-cohort design, the ODS design enriches the observed sample by selectively including certain failure subjects. We present an estimated maximum semiparametric empirical likelihood estimation (EMSELE) under the proportional hazards model framework. The asymptotic properties of the proposed estimator were derived. Simulation studies were conducted to evaluate the small-sample performance of our proposed method. Our analyses show that the proposed estimator and design is more efficient than the current default approach and other competing approaches. Applying the proposed approach with the data set from the MoBa study, we found a significant effect of an environmental contaminant on fecundability. PMID:24812419
A Bayesian approach to parameter and reliability estimation in the Poisson distribution.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1972-01-01
For life testing procedures, a Bayesian analysis is developed with respect to a random intensity parameter in the Poisson distribution. Bayes estimators are derived for the Poisson parameter and the reliability function based on uniform and gamma prior distributions of that parameter. A Monte Carlo procedure is implemented to make possible an empirical mean-squared error comparison between Bayes and existing minimum variance unbiased, as well as maximum likelihood, estimators. As expected, the Bayes estimators have mean-squared errors that are appreciably smaller than those of the other two.
A Global Carbon Assimilation System using a modified EnKF assimilation method
NASA Astrophysics Data System (ADS)
Zhang, S.; Zheng, X.; Chen, Z.; Dan, B.; Chen, J. M.; Yi, X.; Wang, L.; Wu, G.
2014-10-01
A Global Carbon Assimilation System based on Ensemble Kalman filter (GCAS-EK) is developed for assimilating atmospheric CO2 abundance data into an ecosystem model to simultaneously estimate the surface carbon fluxes and atmospheric CO2 distribution. This assimilation approach is based on the ensemble Kalman filter (EnKF), but with several new developments, including using analysis states to iteratively estimate ensemble forecast errors, and a maximum likelihood estimation of the inflation factors of the forecast and observation errors. The proposed assimilation approach is tested in observing system simulation experiments and then used to estimate the terrestrial ecosystem carbon fluxes and atmospheric CO2 distributions from 2002 to 2008. The results showed that this assimilation approach can effectively reduce the biases and uncertainties of the carbon fluxes simulated by the ecosystem model.
Improving RNA-Seq expression estimates by correcting for fragment bias
2011-01-01
The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973
Evaluation of Dynamic Coastal Response to Sea-level Rise Modifies Inundation Likelihood
NASA Technical Reports Server (NTRS)
Lentz, Erika E.; Thieler, E. Robert; Plant, Nathaniel G.; Stippa, Sawyer R.; Horton, Radley M.; Gesch, Dean B.
2016-01-01
Sea-level rise (SLR) poses a range of threats to natural and built environments, making assessments of SLR-induced hazards essential for informed decision making. We develop a probabilistic model that evaluates the likelihood that an area will inundate (flood) or dynamically respond (adapt) to SLR. The broad-area applicability of the approach is demonstrated by producing 30x30m resolution predictions for more than 38,000 sq km of diverse coastal landscape in the northeastern United States. Probabilistic SLR projections, coastal elevation and vertical land movement are used to estimate likely future inundation levels. Then, conditioned on future inundation levels and the current land-cover type, we evaluate the likelihood of dynamic response versus inundation. We find that nearly 70% of this coastal landscape has some capacity to respond dynamically to SLR, and we show that inundation models over-predict land likely to submerge. This approach is well suited to guiding coastal resource management decisions that weigh future SLR impacts and uncertainty against ecological targets and economic constraints.
Nestler, Steffen
2014-05-01
Parameters in structural equation models are typically estimated using the maximum likelihood (ML) approach. Bollen (1996) proposed an alternative non-iterative, equation-by-equation estimator that uses instrumental variables. Although this two-stage least squares/instrumental variables (2SLS/IV) estimator has good statistical properties, one problem with its application is that parameter equality constraints cannot be imposed. This paper presents a mathematical solution to this problem that is based on an extension of the 2SLS/IV approach to a system of equations. We present an example in which our approach was used to examine strong longitudinal measurement invariance. We also investigated the new approach in a simulation study that compared it with ML in the examination of the equality of two latent regression coefficients and strong measurement invariance. Overall, the results show that the suggested approach is a useful extension of the original 2SLS/IV estimator and allows for the effective handling of equality constraints in structural equation models. © 2013 The British Psychological Society.
Cham, Heining; West, Stephen G.; Ma, Yue; Aiken, Leona S.
2012-01-01
A Monte Carlo simulation was conducted to investigate the robustness of four latent variable interaction modeling approaches (Constrained Product Indicator [CPI], Generalized Appended Product Indicator [GAPI], Unconstrained Product Indicator [UPI], and Latent Moderated Structural Equations [LMS]) under high degrees of non-normality of the observed exogenous variables. Results showed that the CPI and LMS approaches yielded biased estimates of the interaction effect when the exogenous variables were highly non-normal. When the violation of non-normality was not severe (normal; symmetric with excess kurtosis < 1), the LMS approach yielded the most efficient estimates of the latent interaction effect with the highest statistical power. In highly non-normal conditions, the GAPI and UPI approaches with ML estimation yielded unbiased latent interaction effect estimates, with acceptable actual Type-I error rates for both the Wald and likelihood ratio tests of interaction effect at N ≥ 500. An empirical example illustrated the use of the four approaches in testing a latent variable interaction between academic self-efficacy and positive family role models in the prediction of academic performance. PMID:23457417
ERIC Educational Resources Information Center
Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S.
2016-01-01
The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…
Chen, Yong; Liu, Yulun; Ning, Jing; Cormier, Janice; Chu, Haitao
2014-01-01
Systematic reviews of diagnostic tests often involve a mixture of case-control and cohort studies. The standard methods for evaluating diagnostic accuracy only focus on sensitivity and specificity and ignore the information on disease prevalence contained in cohort studies. Consequently, such methods cannot provide estimates of measures related to disease prevalence, such as population averaged or overall positive and negative predictive values, which reflect the clinical utility of a diagnostic test. In this paper, we propose a hybrid approach that jointly models the disease prevalence along with the diagnostic test sensitivity and specificity in cohort studies, and the sensitivity and specificity in case-control studies. In order to overcome the potential computational difficulties in the standard full likelihood inference of the proposed hybrid model, we propose an alternative inference procedure based on the composite likelihood. Such composite likelihood based inference does not suffer computational problems and maintains high relative efficiency. In addition, it is more robust to model mis-specifications compared to the standard full likelihood inference. We apply our approach to a review of the performance of contemporary diagnostic imaging modalities for detecting metastases in patients with melanoma. PMID:25897179
PoMo: An Allele Frequency-Based Approach for Species Tree Estimation
De Maio, Nicola; Schrempf, Dominik; Kosiol, Carolin
2015-01-01
Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches. PMID:26209413
A Comparison of a Bayesian and a Maximum Likelihood Tailored Testing Procedure.
ERIC Educational Resources Information Center
McKinley, Robert L.; Reckase, Mark D.
A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…
On non-parametric maximum likelihood estimation of the bivariate survivor function.
Prentice, R L
The likelihood function for the bivariate survivor function F, under independent censorship, is maximized to obtain a non-parametric maximum likelihood estimator &Fcirc;. &Fcirc; may or may not be unique depending on the configuration of singly- and doubly-censored pairs. The likelihood function can be maximized by placing all mass on the grid formed by the uncensored failure times, or half lines beyond the failure time grid, or in the upper right quadrant beyond the grid. By accumulating the mass along lines (or regions) where the likelihood is flat, one obtains a partially maximized likelihood as a function of parameters that can be uniquely estimated. The score equations corresponding to these point mass parameters are derived, using a Lagrange multiplier technique to ensure unit total mass, and a modified Newton procedure is used to calculate the parameter estimates in some limited simulation studies. Some considerations for the further development of non-parametric bivariate survivor function estimators are briefly described.
NASA Astrophysics Data System (ADS)
Perlovsky, Leonid I.; Webb, Virgil H.; Bradley, Scott R.; Hansen, Christopher A.
1998-07-01
An advanced detection and tracking system is being developed for the U.S. Navy's Relocatable Over-the-Horizon Radar (ROTHR) to provide improved tracking performance against small aircraft typically used in drug-smuggling activities. The development is based on the Maximum Likelihood Adaptive Neural System (MLANS), a model-based neural network that combines advantages of neural network and model-based algorithmic approaches. The objective of the MLANS tracker development effort is to address user requirements for increased detection and tracking capability in clutter and improved track position, heading, and speed accuracy. The MLANS tracker is expected to outperform other approaches to detection and tracking for the following reasons. It incorporates adaptive internal models of target return signals, target tracks and maneuvers, and clutter signals, which leads to concurrent clutter suppression, detection, and tracking (track-before-detect). It is not combinatorial and thus does not require any thresholding or peak picking and can track in low signal-to-noise conditions. It incorporates superresolution spectrum estimation techniques exceeding the performance of conventional maximum likelihood and maximum entropy methods. The unique spectrum estimation method is based on the Einsteinian interpretation of the ROTHR received energy spectrum as a probability density of signal frequency. The MLANS neural architecture and learning mechanism are founded on spectrum models and maximization of the "Einsteinian" likelihood, allowing knowledge of the physical behavior of both targets and clutter to be injected into the tracker algorithms. The paper describes the addressed requirements and expected improvements, theoretical foundations, engineering methodology, and results of the development effort to date.
Robust and efficient estimation with weighted composite quantile regression
NASA Astrophysics Data System (ADS)
Jiang, Xuejun; Li, Jingzhi; Xia, Tian; Yan, Wanfeng
2016-09-01
In this paper we introduce a weighted composite quantile regression (CQR) estimation approach and study its application in nonlinear models such as exponential models and ARCH-type models. The weighted CQR is augmented by using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the oracle maximum likelihood estimator (MLE) for a variety of error distributions including the normal, mixed-normal, Student's t, Cauchy distributions, etc. We also suggest an algorithm for the fast implementation of the proposed methodology. Simulations are carried out to compare the performance of different estimators, and the proposed approach is used to analyze the daily S&P 500 Composite index, which verifies the effectiveness and efficiency of our theoretical results.
Building unbiased estimators from non-gaussian likelihoods with application to shear estimation
Madhavacheril, Mathew S.; McDonald, Patrick; Sehgal, Neelima; ...
2015-01-15
We develop a general framework for generating estimators of a given quantity which are unbiased to a given order in the difference between the true value of the underlying quantity and the fiducial position in theory space around which we expand the likelihood. We apply this formalism to rederive the optimal quadratic estimator and show how the replacement of the second derivative matrix with the Fisher matrix is a generic way of creating an unbiased estimator (assuming choice of the fiducial model is independent of data). Next we apply the approach to estimation of shear lensing, closely following the workmore » of Bernstein and Armstrong (2014). Our first order estimator reduces to their estimator in the limit of zero shear, but it also naturally allows for the case of non-constant shear and the easy calculation of correlation functions or power spectra using standard methods. Both our first-order estimator and Bernstein and Armstrong’s estimator exhibit a bias which is quadratic in true shear. Our third-order estimator is, at least in the realm of the toy problem of Bernstein and Armstrong, unbiased to 0.1% in relative shear errors Δg/g for shears up to |g| = 0.2.« less
Building unbiased estimators from non-Gaussian likelihoods with application to shear estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madhavacheril, Mathew S.; Sehgal, Neelima; McDonald, Patrick
2015-01-01
We develop a general framework for generating estimators of a given quantity which are unbiased to a given order in the difference between the true value of the underlying quantity and the fiducial position in theory space around which we expand the likelihood. We apply this formalism to rederive the optimal quadratic estimator and show how the replacement of the second derivative matrix with the Fisher matrix is a generic way of creating an unbiased estimator (assuming choice of the fiducial model is independent of data). Next we apply the approach to estimation of shear lensing, closely following the workmore » of Bernstein and Armstrong (2014). Our first order estimator reduces to their estimator in the limit of zero shear, but it also naturally allows for the case of non-constant shear and the easy calculation of correlation functions or power spectra using standard methods. Both our first-order estimator and Bernstein and Armstrong's estimator exhibit a bias which is quadratic in true shear. Our third-order estimator is, at least in the realm of the toy problem of Bernstein and Armstrong, unbiased to 0.1% in relative shear errors Δg/g for shears up to |g|=0.2.« less
Silverman, Merav H.; Jedd, Kelly; Luciana, Monica
2015-01-01
Behavioral responses to, and the neural processing of, rewards change dramatically during adolescence and may contribute to observed increases in risk-taking during this developmental period. Functional MRI (fMRI) studies suggest differences between adolescents and adults in neural activation during reward processing, but findings are contradictory, and effects have been found in non-predicted directions. The current study uses an activation likelihood estimation (ALE) approach for quantitative meta-analysis of functional neuroimaging studies to: 1) confirm the network of brain regions involved in adolescents’ reward processing, 2) identify regions involved in specific stages (anticipation, outcome) and valence (positive, negative) of reward processing, and 3) identify differences in activation likelihood between adolescent and adult reward-related brain activation. Results reveal a subcortical network of brain regions involved in adolescent reward processing similar to that found in adults with major hubs including the ventral and dorsal striatum, insula, and posterior cingulate cortex (PCC). Contrast analyses find that adolescents exhibit greater likelihood of activation in the insula while processing anticipation relative to outcome and greater likelihood of activation in the putamen and amygdala during outcome relative to anticipation. While processing positive compared to negative valence, adolescents show increased likelihood for activation in the posterior cingulate cortex (PCC) and ventral striatum. Contrasting adolescent reward processing with the existing ALE of adult reward processing (Liu et al., 2011) reveals increased likelihood for activation in limbic, frontolimbic, and striatal regions in adolescents compared with adults. Unlike adolescents, adults also activate executive control regions of the frontal and parietal lobes. These findings support hypothesized elevations in motivated activity during adolescence. PMID:26254587
Bias correction for estimated QTL effects using the penalized maximum likelihood method.
Zhang, J; Yue, C; Zhang, Y-M
2012-04-01
A penalized maximum likelihood method has been proposed as an important approach to the detection of epistatic quantitative trait loci (QTL). However, this approach is not optimal in two special situations: (1) closely linked QTL with effects in opposite directions and (2) small-effect QTL, because the method produces downwardly biased estimates of QTL effects. The present study aims to correct the bias by using correction coefficients and shifting from the use of a uniform prior on the variance parameter of a QTL effect to that of a scaled inverse chi-square prior. The results of Monte Carlo simulation experiments show that the improved method increases the power from 25 to 88% in the detection of two closely linked QTL of equal size in opposite directions and from 60 to 80% in the identification of QTL with small effects (0.5% of the total phenotypic variance). We used the improved method to detect QTL responsible for the barley kernel weight trait using 145 doubled haploid lines developed in the North American Barley Genome Mapping Project. Application of the proposed method to other shrinkage estimation of QTL effects is discussed.
Yang, Huan; Meijer, Hil G E; Buitenweg, Jan R; van Gils, Stephan A
2016-01-01
Healthy or pathological states of nociceptive subsystems determine different stimulus-response relations measured from quantitative sensory testing. In turn, stimulus-response measurements may be used to assess these states. In a recently developed computational model, six model parameters characterize activation of nerve endings and spinal neurons. However, both model nonlinearity and limited information in yes-no detection responses to electrocutaneous stimuli challenge to estimate model parameters. Here, we address the question whether and how one can overcome these difficulties for reliable parameter estimation. First, we fit the computational model to experimental stimulus-response pairs by maximizing the likelihood. To evaluate the balance between model fit and complexity, i.e., the number of model parameters, we evaluate the Bayesian Information Criterion. We find that the computational model is better than a conventional logistic model regarding the balance. Second, our theoretical analysis suggests to vary the pulse width among applied stimuli as a necessary condition to prevent structural non-identifiability. In addition, the numerically implemented profile likelihood approach reveals structural and practical non-identifiability. Our model-based approach with integration of psychophysical measurements can be useful for a reliable assessment of states of the nociceptive system.
An empirical Bayes approach for the Poisson life distribution.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1973-01-01
A smooth empirical Bayes estimator is derived for the intensity parameter (hazard rate) in the Poisson distribution as used in life testing. The reliability function is also estimated either by using the empirical Bayes estimate of the parameter, or by obtaining the expectation of the reliability function. The behavior of the empirical Bayes procedure is studied through Monte Carlo simulation in which estimates of mean-squared errors of the empirical Bayes estimators are compared with those of conventional estimators such as minimum variance unbiased or maximum likelihood. Results indicate a significant reduction in mean-squared error of the empirical Bayes estimators over the conventional variety.
ERIC Educational Resources Information Center
Prevost, A. Toby; Mason, Dan; Griffin, Simon; Kinmonth, Ann-Louise; Sutton, Stephen; Spiegelhalter, David
2007-01-01
Practical meta-analysis of correlation matrices generally ignores covariances (and hence correlations) between correlation estimates. The authors consider various methods for allowing for covariances, including generalized least squares, maximum marginal likelihood, and Bayesian approaches, illustrated using a 6-dimensional response in a series of…
The recursive maximum likelihood proportion estimator: User's guide and test results
NASA Technical Reports Server (NTRS)
Vanrooy, D. L.
1976-01-01
Implementation of the recursive maximum likelihood proportion estimator is described. A user's guide to programs as they currently exist on the IBM 360/67 at LARS, Purdue is included, and test results on LANDSAT data are described. On Hill County data, the algorithm yields results comparable to the standard maximum likelihood proportion estimator.
Bayesian estimation of the transmissivity spatial structure from pumping test data
NASA Astrophysics Data System (ADS)
Demir, Mehmet Taner; Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier
2017-06-01
Estimating the statistical parameters (mean, variance, and integral scale) that define the spatial structure of the transmissivity or hydraulic conductivity fields is a fundamental step for the accurate prediction of subsurface flow and contaminant transport. In practice, the determination of the spatial structure is a challenge because of spatial heterogeneity and data scarcity. In this paper, we describe a novel approach that uses time drawdown data from multiple pumping tests to determine the transmissivity statistical spatial structure. The method builds on the pumping test interpretation procedure of Copty et al. (2011) (Continuous Derivation method, CD), which uses the time-drawdown data and its time derivative to estimate apparent transmissivity values as a function of radial distance from the pumping well. A Bayesian approach is then used to infer the statistical parameters of the transmissivity field by combining prior information about the parameters and the likelihood function expressed in terms of radially-dependent apparent transmissivities determined from pumping tests. A major advantage of the proposed Bayesian approach is that the likelihood function is readily determined from randomly generated multiple realizations of the transmissivity field, without the need to solve the groundwater flow equation. Applying the method to synthetically-generated pumping test data, we demonstrate that, through a relatively simple procedure, information on the spatial structure of the transmissivity may be inferred from pumping tests data. It is also shown that the prior parameter distribution has a significant influence on the estimation procedure, given the non-uniqueness of the estimation procedure. Results also indicate that the reliability of the estimated transmissivity statistical parameters increases with the number of available pumping tests.
Validation of a heteroscedastic hazards regression model.
Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin
2002-03-01
A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial.
Modelling ultrasound guided wave propagation for plate thickness measurement
NASA Astrophysics Data System (ADS)
Malladi, Rakesh; Dabak, Anand; Murthy, Nitish Krishna
2014-03-01
Structural Health monitoring refers to monitoring the health of plate-like walls of large reactors, pipelines and other structures in terms of corrosion detection and thickness estimation. The objective of this work is modeling the ultrasonic guided waves generated in a plate. The piezoelectric is excited by an input pulse to generate ultrasonic guided lamb waves in the plate that are received by another piezoelectric transducer. In contrast with existing methods, we develop a mathematical model of the direct component of the signal (DCS) recorded at the terminals of the piezoelectric transducer. The DCS model uses maximum likelihood technique to estimate the different parameters, namely the time delay of the signal due to the transducer delay and amplitude scaling of all the lamb wave modes due to attenuation, while taking into account the received signal spreading in time due to dispersion. The maximum likelihood estimate minimizes the energy difference between the experimental and the DCS model-generated signal. We demonstrate that the DCS model matches closely with experimentally recorded signals and show it can be used to estimate thickness of the plate. The main idea of the thickness estimation algorithm is to generate a bank of DCS model-generated signals, each corresponding to a different thickness of the plate and then find the closest match among these signals to the received signal, resulting in an estimate of the thickness of the plate. Therefore our approach provides a complementary suite of analytics to the existing thickness monitoring approaches.
ERIC Educational Resources Information Center
Chung, Hwan; Anthony, James C.
2013-01-01
This article presents a multiple-group latent class-profile analysis (LCPA) by taking a Bayesian approach in which a Markov chain Monte Carlo simulation is employed to achieve more robust estimates for latent growth patterns. This article describes and addresses a label-switching problem that involves the LCPA likelihood function, which has…
Deterministic annealing for density estimation by multivariate normal mixtures
NASA Astrophysics Data System (ADS)
Kloppenburg, Martin; Tavan, Paul
1997-03-01
An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.
Development of advanced acreage estimation methods
NASA Technical Reports Server (NTRS)
Guseman, L. F., Jr. (Principal Investigator)
1980-01-01
The use of the AMOEBA clustering/classification algorithm was investigated as a basis for both a color display generation technique and maximum likelihood proportion estimation procedure. An approach to analyzing large data reduction systems was formulated and an exploratory empirical study of spatial correlation in LANDSAT data was also carried out. Topics addressed include: (1) development of multiimage color images; (2) spectral spatial classification algorithm development; (3) spatial correlation studies; and (4) evaluation of data systems.
A unifying framework for marginalized random intercept models of correlated binary outcomes
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian M.
2013-01-01
We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood-based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data with exchangeable correlation structures. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized random intercept models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts. PMID:25342871
Huang, Chiung-Yu; Qin, Jing
2013-01-01
The Canadian Study of Health and Aging (CSHA) employed a prevalent cohort design to study survival after onset of dementia, where patients with dementia were sampled and the onset time of dementia was determined retrospectively. The prevalent cohort sampling scheme favors individuals who survive longer. Thus, the observed survival times are subject to length bias. In recent years, there has been a rising interest in developing estimation procedures for prevalent cohort survival data that not only account for length bias but also actually exploit the incidence distribution of the disease to improve efficiency. This article considers semiparametric estimation of the Cox model for the time from dementia onset to death under a stationarity assumption with respect to the disease incidence. Under the stationarity condition, the semiparametric maximum likelihood estimation is expected to be fully efficient yet difficult to perform for statistical practitioners, as the likelihood depends on the baseline hazard function in a complicated way. Moreover, the asymptotic properties of the semiparametric maximum likelihood estimator are not well-studied. Motivated by the composite likelihood method (Besag 1974), we develop a composite partial likelihood method that retains the simplicity of the popular partial likelihood estimator and can be easily performed using standard statistical software. When applied to the CSHA data, the proposed method estimates a significant difference in survival between the vascular dementia group and the possible Alzheimer’s disease group, while the partial likelihood method for left-truncated and right-censored data yields a greater standard error and a 95% confidence interval covering 0, thus highlighting the practical value of employing a more efficient methodology. To check the assumption of stable disease for the CSHA data, we also present new graphical and numerical tests in the article. The R code used to obtain the maximum composite partial likelihood estimator for the CSHA data is available in the online Supplementary Material, posted on the journal web site. PMID:24000265
NASA Astrophysics Data System (ADS)
Huang, Jinxin; Yuan, Qun; Tankam, Patrice; Clarkson, Eric; Kupinski, Matthew; Hindman, Holly B.; Aquavella, James V.; Rolland, Jannick P.
2015-03-01
In biophotonics imaging, one important and quantitative task is layer-thickness estimation. In this study, we investigate the approach of combining optical coherence tomography and a maximum-likelihood (ML) estimator for layer thickness estimation in the context of tear film imaging. The motivation of this study is to extend our understanding of tear film dynamics, which is the prerequisite to advance the management of Dry Eye Disease, through the simultaneous estimation of the thickness of the tear film lipid and aqueous layers. The estimator takes into account the different statistical processes associated with the imaging chain. We theoretically investigated the impact of key system parameters, such as the axial point spread functions (PSF) and various sources of noise on measurement uncertainty. Simulations show that an OCT system with a 1 μm axial PSF (FWHM) allows unbiased estimates down to nanometers with nanometer precision. In implementation, we built a customized Fourier domain OCT system that operates in the 600 to 1000 nm spectral window and achieves 0.93 micron axial PSF in corneal epithelium. We then validated the theoretical framework with physical phantoms made of custom optical coatings, with layer thicknesses from tens of nanometers to microns. Results demonstrate unbiased nanometer-class thickness estimates in three different physical phantoms.
Bias Correction for the Maximum Likelihood Estimate of Ability. Research Report. ETS RR-05-15
ERIC Educational Resources Information Center
Zhang, Jinming
2005-01-01
Lord's bias function and the weighted likelihood estimation method are effective in reducing the bias of the maximum likelihood estimate of an examinee's ability under the assumption that the true item parameters are known. This paper presents simulation studies to determine the effectiveness of these two methods in reducing the bias when the item…
Fast maximum likelihood estimation of mutation rates using a birth-death process.
Wu, Xiaowei; Zhu, Hongxiao
2015-02-07
Since fluctuation analysis was first introduced by Luria and Delbrück in 1943, it has been widely used to make inference about spontaneous mutation rates in cultured cells. Under certain model assumptions, the probability distribution of the number of mutants that appear in a fluctuation experiment can be derived explicitly, which provides the basis of mutation rate estimation. It has been shown that, among various existing estimators, the maximum likelihood estimator usually demonstrates some desirable properties such as consistency and lower mean squared error. However, its application in real experimental data is often hindered by slow computation of likelihood due to the recursive form of the mutant-count distribution. We propose a fast maximum likelihood estimator of mutation rates, MLE-BD, based on a birth-death process model with non-differential growth assumption. Simulation studies demonstrate that, compared with the conventional maximum likelihood estimator derived from the Luria-Delbrück distribution, MLE-BD achieves substantial improvement on computational speed and is applicable to arbitrarily large number of mutants. In addition, it still retains good accuracy on point estimation. Published by Elsevier Ltd.
Estimating the rate of biological introductions: Lessepsian fishes in the Mediterranean.
Belmaker, Jonathan; Brokovich, Eran; China, Victor; Golani, Daniel; Kiflawi, Moshe
2009-04-01
Sampling issues preclude the direct use of the discovery rate of exotic species as a robust estimate of their rate of introduction. Recently, a method was advanced that allows maximum-likelihood estimation of both the observational probability and the introduction rate from the discovery record. Here, we propose an alternative approach that utilizes the discovery record of native species to control for sampling effort. Implemented in a Bayesian framework using Markov chain Monte Carlo simulations, the approach provides estimates of the rate of introduction of the exotic species, and of additional parameters such as the size of the species pool from which they are drawn. We illustrate the approach using Red Sea fishes recorded in the eastern Mediterranean, after crossing the Suez Canal, and show that the two approaches may lead to different conclusions. The analytical framework is highly flexible and could provide a basis for easy modification to other systems for which first-sighting data on native and introduced species are available.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.
2013-12-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameter identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from indirect concentration measurements in identifying unknown source parameters such as the release time, strength and location. In this approach, the sampling location that gives the maximum relative entropy is selected as the optimal one. Once the sampling location is determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown source parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. Compared with the traditional optimal design, which is based on the Gaussian linear assumption, the method developed in this study can cope with arbitrary nonlinearity. It can be used to assist in groundwater monitor network design and identification of unknown contaminant sources. Contours of the expected information gain. The optimal observing location corresponds to the maximum value. Posterior marginal probability densities of unknown parameters, the thick solid black lines are for the designed location. For comparison, other 7 lines are for randomly chosen locations. The true values are denoted by vertical lines. It is obvious that the unknown parameters are estimated better with the desinged location.
Stram, Daniel O; Leigh Pearce, Celeste; Bretsky, Phillip; Freedman, Matthew; Hirschhorn, Joel N; Altshuler, David; Kolonel, Laurence N; Henderson, Brian E; Thomas, Duncan C
2003-01-01
The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the Rh2 criteria of Stram et al. [1]) the differences between the three methods are very small and in particular that the single imputation method may be expected to work extremely well. Copyright 2003 S. Karger AG, Basel
An alternative method to measure the likelihood of a financial crisis in an emerging market
NASA Astrophysics Data System (ADS)
Özlale, Ümit; Metin-Özcan, Kıvılcım
2007-07-01
This paper utilizes an early warning system in order to measure the likelihood of a financial crisis in an emerging market economy. We introduce a methodology, where we can both obtain a likelihood series and analyze the time-varying effects of several macroeconomic variables on this likelihood. Since the issue is analyzed in a non-linear state space framework, the extended Kalman filter emerges as the optimal estimation algorithm. Taking the Turkish economy as our laboratory, the results indicate that both the derived likelihood measure and the estimated time-varying parameters are meaningful and can successfully explain the path that the Turkish economy had followed between 2000 and 2006. The estimated parameters also suggest that overvalued domestic currency, current account deficit and the increase in the default risk increase the likelihood of having an economic crisis in the economy. Overall, the findings in this paper suggest that the estimation methodology introduced in this paper can also be applied to other emerging market economies as well.
Maximum likelihood estimation of signal-to-noise ratio and combiner weight
NASA Technical Reports Server (NTRS)
Kalson, S.; Dolinar, S. J.
1986-01-01
An algorithm for estimating signal to noise ratio and combiner weight parameters for a discrete time series is presented. The algorithm is based upon the joint maximum likelihood estimate of the signal and noise power. The discrete-time series are the sufficient statistics obtained after matched filtering of a biphase modulated signal in additive white Gaussian noise, before maximum likelihood decoding is performed.
Wang, Chaolong; Schroeder, Kari B.; Rosenberg, Noah A.
2012-01-01
Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, in which one or both allelic copies at a locus fail to be amplified by the polymerase chain reaction. Especially for samples with poor DNA quality, this problem causes a downward bias in estimates of observed heterozygosity and an upward bias in estimates of inbreeding, owing to mistaken classifications of heterozygotes as homozygotes when one of the two copies drops out. One general approach for avoiding allelic dropout involves repeated genotyping of homozygous loci to minimize the effects of experimental error. Existing computational alternatives often require replicate genotyping as well. These approaches, however, are costly and are suitable only when enough DNA is available for repeated genotyping. In this study, we propose a maximum-likelihood approach together with an expectation-maximization algorithm to jointly estimate allelic dropout rates and allele frequencies when only one set of nonreplicated genotypes is available. Our method considers estimates of allelic dropout caused by both sample-specific factors and locus-specific factors, and it allows for deviation from Hardy–Weinberg equilibrium owing to inbreeding. Using the estimated parameters, we correct the bias in the estimation of observed heterozygosity through the use of multiple imputations of alleles in cases where dropout might have occurred. With simulated data, we show that our method can (1) effectively reproduce patterns of missing data and heterozygosity observed in real data; (2) correctly estimate model parameters, including sample-specific dropout rates, locus-specific dropout rates, and the inbreeding coefficient; and (3) successfully correct the downward bias in estimating the observed heterozygosity. We find that our method is fairly robust to violations of model assumptions caused by population structure and by genotyping errors from sources other than allelic dropout. Because the data sets imputed under our model can be investigated in additional subsequent analyses, our method will be useful for preparing data for applications in diverse contexts in population genetics and molecular ecology. PMID:22851645
Group Comparisons in the Presence of Missing Data Using Latent Variable Modeling Techniques
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2010-01-01
A latent variable modeling approach for examining population similarities and differences in observed variable relationship and mean indexes in incomplete data sets is discussed. The method is based on the full information maximum likelihood procedure of model fitting and parameter estimation. The procedure can be employed to test group identities…
Semiparametric Item Response Functions in the Context of Guessing
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2016-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Statistical Signal Processing and the Motor Cortex
Brockwell, A.E.; Kass, R.E.; Schwartz, A.B.
2011-01-01
Over the past few decades, developments in technology have significantly improved the ability to measure activity in the brain. This has spurred a great deal of research into brain function and its relation to external stimuli, and has important implications in medicine and other fields. As a result of improved understanding of brain function, it is now possible to build devices that provide direct interfaces between the brain and the external world. We describe some of the current understanding of function of the motor cortex region. We then discuss a typical likelihood-based state-space model and filtering based approach to address the problems associated with building a motor cortical-controlled cursor or robotic prosthetic device. As a variation on previous work using this approach, we introduce the idea of using Markov chain Monte Carlo methods for parameter estimation in this context. By doing this instead of performing maximum likelihood estimation, it is possible to expand the range of possible models that can be explored, at a cost in terms of computational load. We demonstrate results obtained applying this methodology to experimental data gathered from a monkey. PMID:21765538
Investigating the Impact of Uncertainty about Item Parameters on Ability Estimation
ERIC Educational Resources Information Center
Zhang, Jinming; Xie, Minge; Song, Xiaolan; Lu, Ting
2011-01-01
Asymptotic expansions of the maximum likelihood estimator (MLE) and weighted likelihood estimator (WLE) of an examinee's ability are derived while item parameter estimators are treated as covariates measured with error. The asymptotic formulae present the amount of bias of the ability estimators due to the uncertainty of item parameter estimators.…
Maximum likelihood-based analysis of single-molecule photon arrival trajectories
NASA Astrophysics Data System (ADS)
Hajdziona, Marta; Molski, Andrzej
2011-02-01
In this work we explore the statistical properties of the maximum likelihood-based analysis of one-color photon arrival trajectories. This approach does not involve binning and, therefore, all of the information contained in an observed photon strajectory is used. We study the accuracy and precision of parameter estimates and the efficiency of the Akaike information criterion and the Bayesian information criterion (BIC) in selecting the true kinetic model. We focus on the low excitation regime where photon trajectories can be modeled as realizations of Markov modulated Poisson processes. The number of observed photons is the key parameter in determining model selection and parameter estimation. For example, the BIC can select the true three-state model from competing two-, three-, and four-state kinetic models even for relatively short trajectories made up of 2 × 103 photons. When the intensity levels are well-separated and 104 photons are observed, the two-state model parameters can be estimated with about 10% precision and those for a three-state model with about 20% precision.
Object recognition and localization from 3D point clouds by maximum-likelihood estimation
NASA Astrophysics Data System (ADS)
Dantanarayana, Harshana G.; Huntley, Jonathan M.
2017-08-01
We present an algorithm based on maximum-likelihood analysis for the automated recognition of objects, and estimation of their pose, from 3D point clouds. Surfaces segmented from depth images are used as the features, unlike `interest point'-based algorithms which normally discard such data. Compared to the 6D Hough transform, it has negligible memory requirements, and is computationally efficient compared to iterative closest point algorithms. The same method is applicable to both the initial recognition/pose estimation problem as well as subsequent pose refinement through appropriate choice of the dispersion of the probability density functions. This single unified approach therefore avoids the usual requirement for different algorithms for these two tasks. In addition to the theoretical description, a simple 2 degrees of freedom (d.f.) example is given, followed by a full 6 d.f. analysis of 3D point cloud data from a cluttered scene acquired by a projected fringe-based scanner, which demonstrated an RMS alignment error as low as 0.3 mm.
Logistic regression for circular data
NASA Astrophysics Data System (ADS)
Al-Daffaie, Kadhem; Khan, Shahjahan
2017-05-01
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Regularity of a renewal process estimated from binary data.
Rice, John D; Strawderman, Robert L; Johnson, Brent A
2017-10-09
Assessment of the regularity of a sequence of events over time is important for clinical decision-making as well as informing public health policy. Our motivating example involves determining the effect of an intervention on the regularity of HIV self-testing behavior among high-risk individuals when exact self-testing times are not recorded. Assuming that these unobserved testing times follow a renewal process, the goals of this work are to develop suitable methods for estimating its distributional parameters when only the presence or absence of at least one event per subject in each of several observation windows is recorded. We propose two approaches to estimation and inference: a likelihood-based discrete survival model using only time to first event; and a potentially more efficient quasi-likelihood approach based on the forward recurrence time distribution using all available data. Regularity is quantified and estimated by the coefficient of variation (CV) of the interevent time distribution. Focusing on the gamma renewal process, where the shape parameter of the corresponding interevent time distribution has a monotone relationship with its CV, we conduct simulation studies to evaluate the performance of the proposed methods. We then apply them to our motivating example, concluding that the use of text message reminders significantly improves the regularity of self-testing, but not its frequency. A discussion on interesting directions for further research is provided. © 2017, The International Biometric Society.
Depaoli, Sarah
2013-06-01
Growth mixture modeling (GMM) represents a technique that is designed to capture change over time for unobserved subgroups (or latent classes) that exhibit qualitatively different patterns of growth. The aim of the current article was to explore the impact of latent class separation (i.e., how similar growth trajectories are across latent classes) on GMM performance. Several estimation conditions were compared: maximum likelihood via the expectation maximization (EM) algorithm and the Bayesian framework implementing diffuse priors, "accurate" informative priors, weakly informative priors, data-driven informative priors, priors reflecting partial-knowledge of parameters, and "inaccurate" (but informative) priors. The main goal was to provide insight about the optimal estimation condition under different degrees of latent class separation for GMM. Results indicated that optimal parameter recovery was obtained though the Bayesian approach using "accurate" informative priors, and partial-knowledge priors showed promise for the recovery of the growth trajectory parameters. Maximum likelihood and the remaining Bayesian estimation conditions yielded poor parameter recovery for the latent class proportions and the growth trajectories. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
NASA Technical Reports Server (NTRS)
Lai, Jonathan Y.
1994-01-01
This dissertation focuses on the signal processing problems associated with the detection of hazardous windshears using airborne Doppler radar when weak weather returns are in the presence of strong clutter returns. In light of the frequent inadequacy of spectral-processing oriented clutter suppression methods, we model a clutter signal as multiple sinusoids plus Gaussian noise, and propose adaptive filtering approaches that better capture the temporal characteristics of the signal process. This idea leads to two research topics in signal processing: (1) signal modeling and parameter estimation, and (2) adaptive filtering in this particular signal environment. A high-resolution, low SNR threshold maximum likelihood (ML) frequency estimation and signal modeling algorithm is devised and proves capable of delineating both the spectral and temporal nature of the clutter return. Furthermore, the Least Mean Square (LMS) -based adaptive filter's performance for the proposed signal model is investigated, and promising simulation results have testified to its potential for clutter rejection leading to more accurate estimation of windspeed thus obtaining a better assessment of the windshear hazard.
Using optimal transport theory to estimate transition probabilities in metapopulation dynamics
Nichols, Jonathan M.; Spendelow, Jeffrey A.; Nichols, James D.
2017-01-01
This work considers the estimation of transition probabilities associated with populations moving among multiple spatial locations based on numbers of individuals at each location at two points in time. The problem is generally underdetermined as there exists an extremely large number of ways in which individuals can move from one set of locations to another. A unique solution therefore requires a constraint. The theory of optimal transport provides such a constraint in the form of a cost function, to be minimized in expectation over the space of possible transition matrices. We demonstrate the optimal transport approach on marked bird data and compare to the probabilities obtained via maximum likelihood estimation based on marked individuals. It is shown that by choosing the squared Euclidean distance as the cost, the estimated transition probabilities compare favorably to those obtained via maximum likelihood with marked individuals. Other implications of this cost are discussed, including the ability to accurately interpolate the population's spatial distribution at unobserved points in time and the more general relationship between the cost and minimum transport energy.
NASA Technical Reports Server (NTRS)
Switzer, Eric Ryan; Watts, Duncan J.
2016-01-01
The B-mode polarization of the cosmic microwave background provides a unique window into tensor perturbations from inflationary gravitational waves. Survey effects complicate the estimation and description of the power spectrum on the largest angular scales. The pixel-space likelihood yields parameter distributions without the power spectrum as an intermediate step, but it does not have the large suite of tests available to power spectral methods. Searches for primordial B-modes must rigorously reject and rule out contamination. Many forms of contamination vary or are uncorrelated across epochs, frequencies, surveys, or other data treatment subsets. The cross power and the power spectrum of the difference of subset maps provide approaches to reject and isolate excess variance. We develop an analogous joint pixel-space likelihood. Contamination not modeled in the likelihood produces parameter-dependent bias and complicates the interpretation of the difference map. We describe a null test that consistently weights the difference map. Excess variance should either be explicitly modeled in the covariance or be removed through reprocessing the data.
Time-series analyses of air pollution and mortality in the United States: a subsampling approach.
Moolgavkar, Suresh H; McClellan, Roger O; Dewanji, Anup; Turim, Jay; Luebeck, E Georg; Edwards, Melanie
2013-01-01
Hierarchical Bayesian methods have been used in previous papers to estimate national mean effects of air pollutants on daily deaths in time-series analyses. We obtained maximum likelihood estimates of the common national effects of the criteria pollutants on mortality based on time-series data from ≤ 108 metropolitan areas in the United States. We used a subsampling bootstrap procedure to obtain the maximum likelihood estimates and confidence bounds for common national effects of the criteria pollutants, as measured by the percentage increase in daily mortality associated with a unit increase in daily 24-hr mean pollutant concentration on the previous day, while controlling for weather and temporal trends. We considered five pollutants [PM10, ozone (O3), carbon monoxide (CO), nitrogen dioxide (NO2), and sulfur dioxide (SO2)] in single- and multipollutant analyses. Flexible ambient concentration-response models for the pollutant effects were considered as well. We performed limited sensitivity analyses with different degrees of freedom for time trends. In single-pollutant models, we observed significant associations of daily deaths with all pollutants. The O3 coefficient was highly sensitive to the degree of smoothing of time trends. Among the gases, SO2 and NO2 were most strongly associated with mortality. The flexible ambient concentration-response curve for O3 showed evidence of nonlinearity and a threshold at about 30 ppb. Differences between the results of our analyses and those reported from using the Bayesian approach suggest that estimates of the quantitative impact of pollutants depend on the choice of statistical approach, although results are not directly comparable because they are based on different data. In addition, the estimate of the O3-mortality coefficient depends on the amount of smoothing of time trends.
Spatial dependence of extreme rainfall
NASA Astrophysics Data System (ADS)
Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Satari, Siti Zanariah; Azman, Muhammad Az-zuhri
2017-05-01
This study aims to model the spatial extreme daily rainfall process using the max-stable model. The max-stable model is used to capture the dependence structure of spatial properties of extreme rainfall. Three models from max-stable are considered namely Smith, Schlather and Brown-Resnick models. The methods are applied on 12 selected rainfall stations in Kelantan, Malaysia. Most of the extreme rainfall data occur during wet season from October to December of 1971 to 2012. This period is chosen to assure the available data is enough to satisfy the assumption of stationarity. The dependence parameters including the range and smoothness, are estimated using composite likelihood approach. Then, the bootstrap approach is applied to generate synthetic extreme rainfall data for all models using the estimated dependence parameters. The goodness of fit between the observed extreme rainfall and the synthetic data is assessed using the composite likelihood information criterion (CLIC). Results show that Schlather model is the best followed by Brown-Resnick and Smith models based on the smallest CLIC's value. Thus, the max-stable model is suitable to be used to model extreme rainfall in Kelantan. The study on spatial dependence in extreme rainfall modelling is important to reduce the uncertainties of the point estimates for the tail index. If the spatial dependency is estimated individually, the uncertainties will be large. Furthermore, in the case of joint return level is of interest, taking into accounts the spatial dependence properties will improve the estimation process.
Maximum likelihood estimation of finite mixture model for economic data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-06-01
Finite mixture model is a mixture model with finite-dimension. This models are provides a natural representation of heterogeneity in a finite number of latent classes. In addition, finite mixture models also known as latent class models or unsupervised learning models. Recently, maximum likelihood estimation fitted finite mixture models has greatly drawn statistician's attention. The main reason is because maximum likelihood estimation is a powerful statistical method which provides consistent findings as the sample sizes increases to infinity. Thus, the application of maximum likelihood estimation is used to fit finite mixture model in the present paper in order to explore the relationship between nonlinear economic data. In this paper, a two-component normal mixture model is fitted by maximum likelihood estimation in order to investigate the relationship among stock market price and rubber price for sampled countries. Results described that there is a negative effect among rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia.
Maximum Likelihood Estimation with Emphasis on Aircraft Flight Data
NASA Technical Reports Server (NTRS)
Iliff, K. W.; Maine, R. E.
1985-01-01
Accurate modeling of flexible space structures is an important field that is currently under investigation. Parameter estimation, using methods such as maximum likelihood, is one of the ways that the model can be improved. The maximum likelihood estimator has been used to extract stability and control derivatives from flight data for many years. Most of the literature on aircraft estimation concentrates on new developments and applications, assuming familiarity with basic estimation concepts. Some of these basic concepts are presented. The maximum likelihood estimator and the aircraft equations of motion that the estimator uses are briefly discussed. The basic concepts of minimization and estimation are examined for a simple computed aircraft example. The cost functions that are to be minimized during estimation are defined and discussed. Graphic representations of the cost functions are given to help illustrate the minimization process. Finally, the basic concepts are generalized, and estimation from flight data is discussed. Specific examples of estimation of structural dynamics are included. Some of the major conclusions for the computed example are also developed for the analysis of flight data.
Bayesian parameter estimation for the Wnt pathway: an infinite mixture models approach.
Koutroumpas, Konstantinos; Ballarini, Paolo; Votsi, Irene; Cournède, Paul-Henry
2016-09-01
Likelihood-free methods, like Approximate Bayesian Computation (ABC), have been extensively used in model-based statistical inference with intractable likelihood functions. When combined with Sequential Monte Carlo (SMC) algorithms they constitute a powerful approach for parameter estimation and model selection of mathematical models of complex biological systems. A crucial step in the ABC-SMC algorithms, significantly affecting their performance, is the propagation of a set of parameter vectors through a sequence of intermediate distributions using Markov kernels. In this article, we employ Dirichlet process mixtures (DPMs) to design optimal transition kernels and we present an ABC-SMC algorithm with DPM kernels. We illustrate the use of the proposed methodology using real data for the canonical Wnt signaling pathway. A multi-compartment model of the pathway is developed and it is compared to an existing model. The results indicate that DPMs are more efficient in the exploration of the parameter space and can significantly improve ABC-SMC performance. In comparison to alternative sampling schemes that are commonly used, the proposed approach can bring potential benefits in the estimation of complex multimodal distributions. The method is used to estimate the parameters and the initial state of two models of the Wnt pathway and it is shown that the multi-compartment model fits better the experimental data. Python scripts for the Dirichlet Process Gaussian Mixture model and the Gibbs sampler are available at https://sites.google.com/site/kkoutroumpas/software konstantinos.koutroumpas@ecp.fr. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1975-01-01
New results and insights concerning a previously published iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions were discussed. It was shown that the procedure converges locally to the consistent maximum likelihood estimate as long as a specified parameter is bounded between two limits. Bound values were given to yield optimal local convergence.
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1975-01-01
A general iterative procedure is given for determining the consistent maximum likelihood estimates of normal distributions. In addition, a local maximum of the log-likelihood function, Newtons's method, a method of scoring, and modifications of these procedures are discussed.
Evaluation of dynamic coastal response to sea-level rise modifies inundation likelihood
Lentz, Erika E.; Thieler, E. Robert; Plant, Nathaniel G.; Stippa, Sawyer R.; Horton, Radley M.; Gesch, Dean B.
2016-01-01
Sea-level rise (SLR) poses a range of threats to natural and built environments1, 2, making assessments of SLR-induced hazards essential for informed decision making3. We develop a probabilistic model that evaluates the likelihood that an area will inundate (flood) or dynamically respond (adapt) to SLR. The broad-area applicability of the approach is demonstrated by producing 30 × 30 m resolution predictions for more than 38,000 km2 of diverse coastal landscape in the northeastern United States. Probabilistic SLR projections, coastal elevation and vertical land movement are used to estimate likely future inundation levels. Then, conditioned on future inundation levels and the current land-cover type, we evaluate the likelihood of dynamic response versus inundation. We find that nearly 70% of this coastal landscape has some capacity to respond dynamically to SLR, and we show that inundation models over-predict land likely to submerge. This approach is well suited to guiding coastal resource management decisions that weigh future SLR impacts and uncertainty against ecological targets and economic constraints.
A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq Data.
Choi, Yoonha; Coram, Marc; Peng, Jie; Tang, Hua
2017-07-01
Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.
Spatiotemporal modelling of groundwater extraction in semi-arid central Queensland, Australia
NASA Astrophysics Data System (ADS)
Keir, Greg; Bulovic, Nevenka; McIntyre, Neil
2016-04-01
The semi-arid Surat Basin in central Queensland, Australia, forms part of the Great Artesian Basin, a groundwater resource of national significance. While this area relies heavily on groundwater supply bores to sustain agricultural industries and rural life in general, measurement of groundwater extraction rates is very limited. Consequently, regional groundwater extraction rates are not well known, which may have implications for regional numerical groundwater modelling. However, flows from a small number of bores are metered, and less precise anecdotal estimates of extraction are increasingly available. There is also an increasing number of other spatiotemporal datasets which may help predict extraction rates (e.g. rainfall, temperature, soils, stocking rates etc.). These can be used to construct spatial multivariate regression models to estimate extraction. The data exhibit complicated statistical features, such as zero-valued observations, non-Gaussianity, and non-stationarity, which limit the use of many classical estimation techniques, such as kriging. As well, water extraction histories may exhibit temporal autocorrelation. To account for these features, we employ a separable space-time model to predict bore extraction rates using the R-INLA package for computationally efficient Bayesian inference. A joint approach is used to model both the probability (using a binomial likelihood) and magnitude (using a gamma likelihood) of extraction. The correlation between extraction rates in space and time is modelled using a Gaussian Markov Random Field (GMRF) with a Matérn spatial covariance function which can evolve over time according to an autoregressive model. To reduce computational burden, we allow the GMRF to be evaluated at a relatively coarse temporal resolution, while still allowing predictions to be made at arbitrarily small time scales. We describe the process of model selection and inference using an information criterion approach, and present some preliminary results from the study area. We conclude by discussing issues related with upscaling of the modelling approach to the entire basin, including merging of extraction rate observations with different precision, temporal resolution, and even potentially different likelihoods.
Lee, E Henry; Wickham, Charlotte; Beedlow, Peter A; Waschmann, Ronald S; Tingey, David T
2017-10-01
A time series intervention analysis (TSIA) of dendrochronological data to infer the tree growth-climate-disturbance relations and forest disturbance history is described. Maximum likelihood is used to estimate the parameters of a structural time series model with components for climate and forest disturbances (i.e., pests, diseases, fire). The statistical method is illustrated with a tree-ring width time series for a mature closed-canopy Douglas-fir stand on the west slopes of the Cascade Mountains of Oregon, USA that is impacted by Swiss needle cast disease caused by the foliar fungus, Phaecryptopus gaeumannii (Rhode) Petrak. The likelihood-based TSIA method is proposed for the field of dendrochronology to understand the interaction of temperature, water, and forest disturbances that are important in forest ecology and climate change studies.
Likelihood of Tree Topologies with Fossils and Diversification Rate Estimation.
Didier, Gilles; Fau, Marine; Laurin, Michel
2017-11-01
Since the diversification process cannot be directly observed at the human scale, it has to be studied from the information available, namely the extant taxa and the fossil record. In this sense, phylogenetic trees including both extant taxa and fossils are the most complete representations of the diversification process that one can get. Such phylogenetic trees can be reconstructed from molecular and morphological data, to some extent. Among the temporal information of such phylogenetic trees, fossil ages are by far the most precisely known (divergence times are inferences calibrated mostly with fossils). We propose here a method to compute the likelihood of a phylogenetic tree with fossils in which the only considered time information is the fossil ages, and apply it to the estimation of the diversification rates from such data. Since it is required in our computation, we provide a method for determining the probability of a tree topology under the standard diversification model. Testing our approach on simulated data shows that the maximum likelihood rate estimates from the phylogenetic tree topology and the fossil dates are almost as accurate as those obtained by taking into account all the data, including the divergence times. Moreover, they are substantially more accurate than the estimates obtained only from the exact divergence times (without taking into account the fossil record). We also provide an empirical example composed of 50 Permo-Carboniferous eupelycosaur (early synapsid) taxa ranging in age from about 315 Ma (Late Carboniferous) to 270 Ma (shortly after the end of the Early Permian). Our analyses suggest a speciation (cladogenesis, or birth) rate of about 0.1 per lineage and per myr, a marginally lower extinction rate, and a considerable hidden paleobiodiversity of early synapsids. [Extinction rate; fossil ages; maximum likelihood estimation; speciation rate.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Hua, Wei; Sun, Guoying; Dodd, Caitlin N; Romio, Silvana A; Whitaker, Heather J; Izurieta, Hector S; Black, Steven; Sturkenboom, Miriam C J M; Davis, Robert L; Deceuninck, Genevieve; Andrews, N J
2013-08-01
The assumption that the occurrence of outcome event must not alter subsequent exposure probability is critical for preserving the validity of the self-controlled case series (SCCS) method. This assumption is violated in scenarios in which the event constitutes a contraindication for exposure. In this simulation study, we compared the performance of the standard SCCS approach and two alternative approaches when the event-independent exposure assumption was violated. Using the 2009 H1N1 and seasonal influenza vaccines and Guillain-Barré syndrome as a model, we simulated a scenario in which an individual may encounter multiple unordered exposures and each exposure may be contraindicated by the occurrence of outcome event. The degree of contraindication was varied at 0%, 50%, and 100%. The first alternative approach used only cases occurring after exposure with follow-up time starting from exposure. The second used a pseudo-likelihood method. When the event-independent exposure assumption was satisfied, the standard SCCS approach produced nearly unbiased relative incidence estimates. When this assumption was partially or completely violated, two alternative SCCS approaches could be used. While the post-exposure cases only approach could handle only one exposure, the pseudo-likelihood approach was able to correct bias for both exposures. Violation of the event-independent exposure assumption leads to an overestimation of relative incidence which could be corrected by alternative SCCS approaches. In multiple exposure situations, the pseudo-likelihood approach is optimal; the post-exposure cases only approach is limited in handling a second exposure and may introduce additional bias, thus should be used with caution. Copyright © 2013 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles
2012-01-01
This paper focuses on two estimators of ability with logistic item response theory models: the Bayesian modal (BM) estimator and the weighted likelihood (WL) estimator. For the BM estimator, Jeffreys' prior distribution is considered, and the corresponding estimator is referred to as the Jeffreys modal (JM) estimator. It is established that under…
SEMIPARAMETRIC EFFICIENT ESTIMATION FOR SHARED-FRAILTY MODELS WITH DOUBLY-CENSORED CLUSTERED DATA
Wang, Jane-Ling
2018-01-01
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left censoring. Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left censoring feature of the data. The new algorithm not only resolves this challenge but also accommodate the additional frailty variable effectively. Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data. PMID:29527068
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood
NASA Astrophysics Data System (ADS)
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
Multiple-hit parameter estimation in monolithic detectors.
Hunter, William C J; Barrett, Harrison H; Lewellen, Tom K; Miyaoka, Robert S
2013-02-01
We examine a maximum-a-posteriori method for estimating the primary interaction position of gamma rays with multiple interaction sites (hits) in a monolithic detector. In assessing the performance of a multiple-hit estimator over that of a conventional one-hit estimator, we consider a few different detector and readout configurations of a 50-mm-wide square cerium-doped lutetium oxyorthosilicate block. For this study, we use simulated data from SCOUT, a Monte-Carlo tool for photon tracking and modeling scintillation- camera output. With this tool, we determine estimate bias and variance for a multiple-hit estimator and compare these with similar metrics for a one-hit maximum-likelihood estimator, which assumes full energy deposition in one hit. We also examine the effect of event filtering on these metrics; for this purpose, we use a likelihood threshold to reject signals that are not likely to have been produced under the assumed likelihood model. Depending on detector design, we observe a 1%-12% improvement of intrinsic resolution for a 1-or-2-hit estimator as compared with a 1-hit estimator. We also observe improved differentiation of photopeak events using a 1-or-2-hit estimator as compared with the 1-hit estimator; more than 6% of photopeak events that were rejected by likelihood filtering for the 1-hit estimator were accurately identified as photopeak events and positioned without loss of resolution by a 1-or-2-hit estimator; for PET, this equates to at least a 12% improvement in coincidence-detection efficiency with likelihood filtering applied.
Experimental Design for Parameter Estimation of Gene Regulatory Networks
Timmer, Jens
2012-01-01
Systems biology aims for building quantitative models to address unresolved issues in molecular biology. In order to describe the behavior of biological cells adequately, gene regulatory networks (GRNs) are intensively investigated. As the validity of models built for GRNs depends crucially on the kinetic rates, various methods have been developed to estimate these parameters from experimental data. For this purpose, it is favorable to choose the experimental conditions yielding maximal information. However, existing experimental design principles often rely on unfulfilled mathematical assumptions or become computationally demanding with growing model complexity. To solve this problem, we combined advanced methods for parameter and uncertainty estimation with experimental design considerations. As a showcase, we optimized three simulated GRNs in one of the challenges from the Dialogue for Reverse Engineering Assessment and Methods (DREAM). This article presents our approach, which was awarded the best performing procedure at the DREAM6 Estimation of Model Parameters challenge. For fast and reliable parameter estimation, local deterministic optimization of the likelihood was applied. We analyzed identifiability and precision of the estimates by calculating the profile likelihood. Furthermore, the profiles provided a way to uncover a selection of most informative experiments, from which the optimal one was chosen using additional criteria at every step of the design process. In conclusion, we provide a strategy for optimal experimental design and show its successful application on three highly nonlinear dynamic models. Although presented in the context of the GRNs to be inferred for the DREAM6 challenge, the approach is generic and applicable to most types of quantitative models in systems biology and other disciplines. PMID:22815723
An evaluation of percentile and maximum likelihood estimators of weibull paremeters
Stanley J. Zarnoch; Tommy R. Dell
1985-01-01
Two methods of estimating the three-parameter Weibull distribution were evaluated by computer simulation and field data comparison. Maximum likelihood estimators (MLB) with bias correction were calculated with the computer routine FITTER (Bailey 1974); percentile estimators (PCT) were those proposed by Zanakis (1979). The MLB estimators had superior smaller bias and...
The Equivalence of Two Methods of Parameter Estimation for the Rasch Model.
ERIC Educational Resources Information Center
Blackwood, Larry G.; Bradley, Edwin L.
1989-01-01
Two methods of estimating parameters in the Rasch model are compared. The equivalence of likelihood estimations from the model of G. J. Mellenbergh and P. Vijn (1981) and from usual unconditional maximum likelihood (UML) estimation is demonstrated. Mellenbergh and Vijn's model is a convenient method of calculating UML estimates. (SLD)
ERIC Educational Resources Information Center
Yang, Xiangdong; Poggio, John C.; Glasnapp, Douglas R.
2006-01-01
The effects of five ability estimators, that is, maximum likelihood estimator, weighted likelihood estimator, maximum a posteriori, expected a posteriori, and Owen's sequential estimator, on the performances of the item response theory-based adaptive classification procedure on multiple categories were studied via simulations. The following…
Likelihood ratio meta-analysis: New motivation and approach for an old method.
Dormuth, Colin R; Filion, Kristian B; Platt, Robert W
2016-03-01
A 95% confidence interval (CI) in an updated meta-analysis may not have the expected 95% coverage. If a meta-analysis is simply updated with additional data, then the resulting 95% CI will be wrong because it will not have accounted for the fact that the earlier meta-analysis failed or succeeded to exclude the null. This situation can be avoided by using the likelihood ratio (LR) as a measure of evidence that does not depend on type-1 error. We show how an LR-based approach, first advanced by Goodman, can be used in a meta-analysis to pool data from separate studies to quantitatively assess where the total evidence points. The method works by estimating the log-likelihood ratio (LogLR) function from each study. Those functions are then summed to obtain a combined function, which is then used to retrieve the total effect estimate, and a corresponding 'intrinsic' confidence interval. Using as illustrations the CAPRIE trial of clopidogrel versus aspirin in the prevention of ischemic events, and our own meta-analysis of higher potency statins and the risk of acute kidney injury, we show that the LR-based method yields the same point estimate as the traditional analysis, but with an intrinsic confidence interval that is appropriately wider than the traditional 95% CI. The LR-based method can be used to conduct both fixed effect and random effects meta-analyses, it can be applied to old and new meta-analyses alike, and results can be presented in a format that is familiar to a meta-analytic audience. Copyright © 2016 Elsevier Inc. All rights reserved.
Zhu, Xiang; Stephens, Matthew
2017-01-01
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241
Temporal rainfall estimation using input data reduction and model inversion
NASA Astrophysics Data System (ADS)
Wright, A. J.; Vrugt, J. A.; Walker, J. P.; Pauwels, V. R. N.
2016-12-01
Floods are devastating natural hazards. To provide accurate, precise and timely flood forecasts there is a need to understand the uncertainties associated with temporal rainfall and model parameters. The estimation of temporal rainfall and model parameter distributions from streamflow observations in complex dynamic catchments adds skill to current areal rainfall estimation methods, allows for the uncertainty of rainfall input to be considered when estimating model parameters and provides the ability to estimate rainfall from poorly gauged catchments. Current methods to estimate temporal rainfall distributions from streamflow are unable to adequately explain and invert complex non-linear hydrologic systems. This study uses the Discrete Wavelet Transform (DWT) to reduce rainfall dimensionality for the catchment of Warwick, Queensland, Australia. The reduction of rainfall to DWT coefficients allows the input rainfall time series to be simultaneously estimated along with model parameters. The estimation process is conducted using multi-chain Markov chain Monte Carlo simulation with the DREAMZS algorithm. The use of a likelihood function that considers both rainfall and streamflow error allows for model parameter and temporal rainfall distributions to be estimated. Estimation of the wavelet approximation coefficients of lower order decomposition structures was able to estimate the most realistic temporal rainfall distributions. These rainfall estimates were all able to simulate streamflow that was superior to the results of a traditional calibration approach. It is shown that the choice of wavelet has a considerable impact on the robustness of the inversion. The results demonstrate that streamflow data contains sufficient information to estimate temporal rainfall and model parameter distributions. The extent and variance of rainfall time series that are able to simulate streamflow that is superior to that simulated by a traditional calibration approach is a demonstration of equifinality. The use of a likelihood function that considers both rainfall and streamflow error combined with the use of the DWT as a model data reduction technique allows the joint inference of hydrologic model parameters along with rainfall.
Maximum likelihood solution for inclination-only data in paleomagnetism
NASA Astrophysics Data System (ADS)
Arason, P.; Levi, S.
2010-08-01
We have developed a new robust maximum likelihood method for estimating the unbiased mean inclination from inclination-only data. In paleomagnetic analysis, the arithmetic mean of inclination-only data is known to introduce a shallowing bias. Several methods have been introduced to estimate the unbiased mean inclination of inclination-only data together with measures of the dispersion. Some inclination-only methods were designed to maximize the likelihood function of the marginal Fisher distribution. However, the exact analytical form of the maximum likelihood function is fairly complicated, and all the methods require various assumptions and approximations that are often inappropriate. For some steep and dispersed data sets, these methods provide estimates that are significantly displaced from the peak of the likelihood function to systematically shallower inclination. The problem locating the maximum of the likelihood function is partly due to difficulties in accurately evaluating the function for all values of interest, because some elements of the likelihood function increase exponentially as precision parameters increase, leading to numerical instabilities. In this study, we succeeded in analytically cancelling exponential elements from the log-likelihood function, and we are now able to calculate its value anywhere in the parameter space and for any inclination-only data set. Furthermore, we can now calculate the partial derivatives of the log-likelihood function with desired accuracy, and locate the maximum likelihood without the assumptions required by previous methods. To assess the reliability and accuracy of our method, we generated large numbers of random Fisher-distributed data sets, for which we calculated mean inclinations and precision parameters. The comparisons show that our new robust Arason-Levi maximum likelihood method is the most reliable, and the mean inclination estimates are the least biased towards shallow values.
A fast least-squares algorithm for population inference
2013-01-01
Background Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual’s genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. Results We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. Conclusions The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate. PMID:23343408
A fast least-squares algorithm for population inference.
Parry, R Mitchell; Wang, May D
2013-01-23
Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual's genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate.
Estimating parameter of Rayleigh distribution by using Maximum Likelihood method and Bayes method
NASA Astrophysics Data System (ADS)
Ardianti, Fitri; Sutarman
2018-01-01
In this paper, we use Maximum Likelihood estimation and Bayes method under some risk function to estimate parameter of Rayleigh distribution to know the best method. The prior knowledge which used in Bayes method is Jeffrey’s non-informative prior. Maximum likelihood estimation and Bayes method under precautionary loss function, entropy loss function, loss function-L 1 will be compared. We compare these methods by bias and MSE value using R program. After that, the result will be displayed in tables to facilitate the comparisons.
SEPARABLE FACTOR ANALYSIS WITH APPLICATIONS TO MORTALITY DATA
Fosdick, Bailey K.; Hoff, Peter D.
2014-01-01
Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations. PMID:25489353
Consistency of Rasch Model Parameter Estimation: A Simulation Study.
ERIC Educational Resources Information Center
van den Wollenberg, Arnold L.; And Others
1988-01-01
The unconditional--simultaneous--maximum likelihood (UML) estimation procedure for the one-parameter logistic model produces biased estimators. The UML method is inconsistent and is not a good alternative to conditional maximum likelihood method, at least with small numbers of items. The minimum Chi-square estimation procedure produces unbiased…
Multisensor fusion for 3D target tracking using track-before-detect particle filter
NASA Astrophysics Data System (ADS)
Moshtagh, Nima; Romberg, Paul M.; Chan, Moses W.
2015-05-01
This work presents a novel fusion mechanism for estimating the three-dimensional trajectory of a moving target using images collected by multiple imaging sensors. The proposed projective particle filter avoids the explicit target detection prior to fusion. In projective particle filter, particles that represent the posterior density (of target state in a high-dimensional space) are projected onto the lower-dimensional observation space. Measurements are generated directly in the observation space (image plane) and a marginal (sensor) likelihood is computed. The particles states and their weights are updated using the joint likelihood computed from all the sensors. The 3D state estimate of target (system track) is then generated from the states of the particles. This approach is similar to track-before-detect particle filters that are known to perform well in tracking dim and stealthy targets in image collections. Our approach extends the track-before-detect approach to 3D tracking using the projective particle filter. The performance of this measurement-level fusion method is compared with that of a track-level fusion algorithm using the projective particle filter. In the track-level fusion algorithm, the 2D sensor tracks are generated separately and transmitted to a fusion center, where they are treated as measurements to the state estimator. The 2D sensor tracks are then fused to reconstruct the system track. A realistic synthetic scenario with a boosting target was generated, and used to study the performance of the fusion mechanisms.
M-dwarf exoplanet surface density distribution. A log-normal fit from 0.07 to 400 AU
NASA Astrophysics Data System (ADS)
Meyer, Michael R.; Amara, Adam; Reggiani, Maddalena; Quanz, Sascha P.
2018-04-01
Aims: We fit a log-normal function to the M-dwarf orbital surface density distribution of gas giant planets, over the mass range 1-10 times that of Jupiter, from 0.07 to 400 AU. Methods: We used a Markov chain Monte Carlo approach to explore the likelihoods of various parameter values consistent with point estimates of the data given our assumed functional form. Results: This fit is consistent with radial velocity, microlensing, and direct-imaging observations, is well-motivated from theoretical and phenomenological points of view, and predicts results of future surveys. We present probability distributions for each parameter and a maximum likelihood estimate solution. Conclusions: We suggest that this function makes more physical sense than other widely used functions, and we explore the implications of our results on the design of future exoplanet surveys.
A new maximum-likelihood change estimator for two-pass SAR coherent change detection
Wahl, Daniel E.; Yocky, David A.; Jakowatz, Jr., Charles V.; ...
2016-01-11
In previous research, two-pass repeat-geometry synthetic aperture radar (SAR) coherent change detection (CCD) predominantly utilized the sample degree of coherence as a measure of the temporal change occurring between two complex-valued image collects. Previous coherence-based CCD approaches tend to show temporal change when there is none in areas of the image that have a low clutter-to-noise power ratio. Instead of employing the sample coherence magnitude as a change metric, in this paper, we derive a new maximum-likelihood (ML) temporal change estimate—the complex reflectance change detection (CRCD) metric to be used for SAR coherent temporal change detection. The new CRCD estimatormore » is a surprisingly simple expression, easy to implement, and optimal in the ML sense. As a result, this new estimate produces improved results in the coherent pair collects that we have tested.« less
Species richness in soil bacterial communities: a proposed approach to overcome sample size bias.
Youssef, Noha H; Elshahed, Mostafa S
2008-09-01
Estimates of species richness based on 16S rRNA gene clone libraries are increasingly utilized to gauge the level of bacterial diversity within various ecosystems. However, previous studies have indicated that regardless of the utilized approach, species richness estimates obtained are dependent on the size of the analyzed clone libraries. We here propose an approach to overcome sample size bias in species richness estimates in complex microbial communities. Parametric (Maximum likelihood-based and rarefaction curve-based) and non-parametric approaches were used to estimate species richness in a library of 13,001 near full-length 16S rRNA clones derived from soil, as well as in multiple subsets of the original library. Species richness estimates obtained increased with the increase in library size. To obtain a sample size-unbiased estimate of species richness, we calculated the theoretical clone library sizes required to encounter the estimated species richness at various clone library sizes, used curve fitting to determine the theoretical clone library size required to encounter the "true" species richness, and subsequently determined the corresponding sample size-unbiased species richness value. Using this approach, sample size-unbiased estimates of 17,230, 15,571, and 33,912 were obtained for the ML-based, rarefaction curve-based, and ACE-1 estimators, respectively, compared to bias-uncorrected values of 15,009, 11,913, and 20,909.
Composite Linear Models | Division of Cancer Prevention
By Stuart G. Baker The composite linear models software is a matrix approach to compute maximum likelihood estimates and asymptotic standard errors for models for incomplete multinomial data. It implements the method described in Baker SG. Composite linear models for incomplete multinomial data. Statistics in Medicine 1994;13:609-622. The software includes a library of thirty
Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2015-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Mark D. Nelson; Ronald E. McRoberts; Greg C. Liknes; Geoffrey R. Holden
2005-01-01
Landsat Thematic Mapper (TM) satellite imagery and Forest Inventory and Analysis (FIA) plot data were used to construct forest/nonforest maps of Mapping Zone 41, National Land Cover Dataset 2000 (NLCD 2000). Stratification approaches resulting from Maximum Likelihood, Fuzzy Convolution, Logistic Regression, and k-Nearest Neighbors classification/prediction methods were...
Developing Multidimensional Likert Scales Using Item Factor Analysis: The Case of Four-Point Items
ERIC Educational Resources Information Center
Asún, Rodrigo A.; Rdz-Navarro, Karina; Alvarado, Jesús M.
2016-01-01
This study compares the performance of two approaches in analysing four-point Likert rating scales with a factorial model: the classical factor analysis (FA) and the item factor analysis (IFA). For FA, maximum likelihood and weighted least squares estimations using Pearson correlation matrices among items are compared. For IFA, diagonally weighted…
2010-06-01
GMKPF represents a better and more flexible alternative to the Gaussian Maximum Likelihood (GML), and Exponential Maximum Likelihood ( EML ...accurate results relative to GML and EML when the network delays are modeled in terms of a single non-Gaussian/non-exponential distribution or as a...to the Gaussian Maximum Likelihood (GML), and Exponential Maximum Likelihood ( EML ) estimators for clock offset estimation in non-Gaussian or non
Maximum-likelihood estimation of parameterized wavefronts from multifocal data
Sakamoto, Julia A.; Barrett, Harrison H.
2012-01-01
A method for determining the pupil phase distribution of an optical system is demonstrated. Coefficients in a wavefront expansion were estimated using likelihood methods, where the data consisted of multiple irradiance patterns near focus. Proof-of-principle results were obtained in both simulation and experiment. Large-aberration wavefronts were handled in the numerical study. Experimentally, we discuss the handling of nuisance parameters. Fisher information matrices, Cramér-Rao bounds, and likelihood surfaces are examined. ML estimates were obtained by simulated annealing to deal with numerous local extrema in the likelihood function. Rapid processing techniques were employed to reduce the computational time. PMID:22772282
Generalized Ordinary Differential Equation Models 1
Miao, Hongyu; Wu, Hulin; Xue, Hongqi
2014-01-01
Existing estimation methods for ordinary differential equation (ODE) models are not applicable to discrete data. The generalized ODE (GODE) model is therefore proposed and investigated for the first time. We develop the likelihood-based parameter estimation and inference methods for GODE models. We propose robust computing algorithms and rigorously investigate the asymptotic properties of the proposed estimator by considering both measurement errors and numerical errors in solving ODEs. The simulation study and application of our methods to an influenza viral dynamics study suggest that the proposed methods have a superior performance in terms of accuracy over the existing ODE model estimation approach and the extended smoothing-based (ESB) method. PMID:25544787
Generalized Ordinary Differential Equation Models.
Miao, Hongyu; Wu, Hulin; Xue, Hongqi
2014-10-01
Existing estimation methods for ordinary differential equation (ODE) models are not applicable to discrete data. The generalized ODE (GODE) model is therefore proposed and investigated for the first time. We develop the likelihood-based parameter estimation and inference methods for GODE models. We propose robust computing algorithms and rigorously investigate the asymptotic properties of the proposed estimator by considering both measurement errors and numerical errors in solving ODEs. The simulation study and application of our methods to an influenza viral dynamics study suggest that the proposed methods have a superior performance in terms of accuracy over the existing ODE model estimation approach and the extended smoothing-based (ESB) method.
A comparison of abundance estimates from extended batch-marking and Jolly–Seber-type experiments
Cowen, Laura L E; Besbeas, Panagiotis; Morgan, Byron J T; Schwarz, Carl J
2014-01-01
Little attention has been paid to the use of multi-sample batch-marking studies, as it is generally assumed that an individual's capture history is necessary for fully efficient estimates. However, recently, Huggins et al. (2010) present a pseudo-likelihood for a multi-sample batch-marking study where they used estimating equations to solve for survival and capture probabilities and then derived abundance estimates using a Horvitz–Thompson-type estimator. We have developed and maximized the likelihood for batch-marking studies. We use data simulated from a Jolly–Seber-type study and convert this to what would have been obtained from an extended batch-marking study. We compare our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz (CMAS) model with those of the extended batch-marking model to determine the efficiency of collecting and analyzing batch-marking data. We found that estimates of abundance were similar for all three estimators: CMAS, Huggins, and our likelihood. Gains are made when using unique identifiers and employing the CMAS model in terms of precision; however, the likelihood typically had lower mean square error than the pseudo-likelihood method of Huggins et al. (2010). When faced with designing a batch-marking study, researchers can be confident in obtaining unbiased abundance estimators. Furthermore, they can design studies in order to reduce mean square error by manipulating capture probabilities and sample size. PMID:24558576
Optimal design and use of retry in fault tolerant real-time computer systems
NASA Technical Reports Server (NTRS)
Lee, Y. H.; Shin, K. G.
1983-01-01
A new method to determin an optimal retry policy and for use in retry of fault characterization is presented. An optimal retry policy for a given fault characteristic, which determines the maximum allowable retry durations to minimize the total task completion time was derived. The combined fault characterization and retry decision, in which the characteristics of fault are estimated simultaneously with the determination of the optimal retry policy were carried out. Two solution approaches were developed, one based on the point estimation and the other on the Bayes sequential decision. The maximum likelihood estimators are used for the first approach, and the backward induction for testing hypotheses in the second approach. Numerical examples in which all the durations associated with faults have monotone hazard functions, e.g., exponential, Weibull and gamma distributions are presented. These are standard distributions commonly used for modeling analysis and faults.
NASA Astrophysics Data System (ADS)
Langlois, Dominic; Cousineau, Denis; Thivierge, J. P.
2014-01-01
The coordination of activity amongst populations of neurons in the brain is critical to cognition and behavior. One form of coordinated activity that has been widely studied in recent years is the so-called neuronal avalanche, whereby ongoing bursts of activity follow a power-law distribution. Avalanches that follow a power law are not unique to neuroscience, but arise in a broad range of natural systems, including earthquakes, magnetic fields, biological extinctions, fluid dynamics, and superconductors. Here, we show that common techniques that estimate this distribution fail to take into account important characteristics of the data and may lead to a sizable misestimation of the slope of power laws. We develop an alternative series of maximum likelihood estimators for discrete, continuous, bounded, and censored data. Using numerical simulations, we show that these estimators lead to accurate evaluations of power-law distributions, improving on common approaches. Next, we apply these estimators to recordings of in vitro rat neocortical activity. We show that different estimators lead to marked discrepancies in the evaluation of power-law distributions. These results call into question a broad range of findings that may misestimate the slope of power laws by failing to take into account key aspects of the observed data.
Langlois, Dominic; Cousineau, Denis; Thivierge, J P
2014-01-01
The coordination of activity amongst populations of neurons in the brain is critical to cognition and behavior. One form of coordinated activity that has been widely studied in recent years is the so-called neuronal avalanche, whereby ongoing bursts of activity follow a power-law distribution. Avalanches that follow a power law are not unique to neuroscience, but arise in a broad range of natural systems, including earthquakes, magnetic fields, biological extinctions, fluid dynamics, and superconductors. Here, we show that common techniques that estimate this distribution fail to take into account important characteristics of the data and may lead to a sizable misestimation of the slope of power laws. We develop an alternative series of maximum likelihood estimators for discrete, continuous, bounded, and censored data. Using numerical simulations, we show that these estimators lead to accurate evaluations of power-law distributions, improving on common approaches. Next, we apply these estimators to recordings of in vitro rat neocortical activity. We show that different estimators lead to marked discrepancies in the evaluation of power-law distributions. These results call into question a broad range of findings that may misestimate the slope of power laws by failing to take into account key aspects of the observed data.
NASA Astrophysics Data System (ADS)
Balbi, Stefano; Villa, Ferdinando; Mojtahed, Vahid; Hegetschweiler, Karin Tessa; Giupponi, Carlo
2016-06-01
This article presents a novel methodology to assess flood risk to people by integrating people's vulnerability and ability to cushion hazards through coping and adapting. The proposed approach extends traditional risk assessments beyond material damages; complements quantitative and semi-quantitative data with subjective and local knowledge, improving the use of commonly available information; and produces estimates of model uncertainty by providing probability distributions for all of its outputs. Flood risk to people is modeled using a spatially explicit Bayesian network model calibrated on expert opinion. Risk is assessed in terms of (1) likelihood of non-fatal physical injury, (2) likelihood of post-traumatic stress disorder and (3) likelihood of death. The study area covers the lower part of the Sihl valley (Switzerland) including the city of Zurich. The model is used to estimate the effect of improving an existing early warning system, taking into account the reliability, lead time and scope (i.e., coverage of people reached by the warning). Model results indicate that the potential benefits of an improved early warning in terms of avoided human impacts are particularly relevant in case of a major flood event.
Alternative Methods for Handling Attrition
Foster, E. Michael; Fang, Grace Y.
2009-01-01
Using data from the evaluation of the Fast Track intervention, this article illustrates three methods for handling attrition. Multiple imputation and ignorable maximum likelihood estimation produce estimates that are similar to those based on listwise-deleted data. A panel selection model that allows for selective dropout reveals that highly aggressive boys accumulate in the treatment group over time and produces a larger estimate of treatment effect. In contrast, this model produces a smaller treatment effect for girls. The article's conclusion discusses the strengths and weaknesses of the alternative approaches and outlines ways in which researchers might improve their handling of attrition. PMID:15358906
Multiple-Hit Parameter Estimation in Monolithic Detectors
Barrett, Harrison H.; Lewellen, Tom K.; Miyaoka, Robert S.
2014-01-01
We examine a maximum-a-posteriori method for estimating the primary interaction position of gamma rays with multiple interaction sites (hits) in a monolithic detector. In assessing the performance of a multiple-hit estimator over that of a conventional one-hit estimator, we consider a few different detector and readout configurations of a 50-mm-wide square cerium-doped lutetium oxyorthosilicate block. For this study, we use simulated data from SCOUT, a Monte-Carlo tool for photon tracking and modeling scintillation- camera output. With this tool, we determine estimate bias and variance for a multiple-hit estimator and compare these with similar metrics for a one-hit maximum-likelihood estimator, which assumes full energy deposition in one hit. We also examine the effect of event filtering on these metrics; for this purpose, we use a likelihood threshold to reject signals that are not likely to have been produced under the assumed likelihood model. Depending on detector design, we observe a 1%–12% improvement of intrinsic resolution for a 1-or-2-hit estimator as compared with a 1-hit estimator. We also observe improved differentiation of photopeak events using a 1-or-2-hit estimator as compared with the 1-hit estimator; more than 6% of photopeak events that were rejected by likelihood filtering for the 1-hit estimator were accurately identified as photopeak events and positioned without loss of resolution by a 1-or-2-hit estimator; for PET, this equates to at least a 12% improvement in coincidence-detection efficiency with likelihood filtering applied. PMID:23193231
Unified framework to evaluate panmixia and migration direction among multiple sampling locations.
Beerli, Peter; Palczewski, Michal
2010-05-01
For many biological investigations, groups of individuals are genetically sampled from several geographic locations. These sampling locations often do not reflect the genetic population structure. We describe a framework using marginal likelihoods to compare and order structured population models, such as testing whether the sampling locations belong to the same randomly mating population or comparing unidirectional and multidirectional gene flow models. In the context of inferences employing Markov chain Monte Carlo methods, the accuracy of the marginal likelihoods depends heavily on the approximation method used to calculate the marginal likelihood. Two methods, modified thermodynamic integration and a stabilized harmonic mean estimator, are compared. With finite Markov chain Monte Carlo run lengths, the harmonic mean estimator may not be consistent. Thermodynamic integration, in contrast, delivers considerably better estimates of the marginal likelihood. The choice of prior distributions does not influence the order and choice of the better models when the marginal likelihood is estimated using thermodynamic integration, whereas with the harmonic mean estimator the influence of the prior is pronounced and the order of the models changes. The approximation of marginal likelihood using thermodynamic integration in MIGRATE allows the evaluation of complex population genetic models, not only of whether sampling locations belong to a single panmictic population, but also of competing complex structured population models.
Nowakowska, Marzena
2017-04-01
The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Identification of dynamic systems, theory and formulation
NASA Technical Reports Server (NTRS)
Maine, R. E.; Iliff, K. W.
1985-01-01
The problem of estimating parameters of dynamic systems is addressed in order to present the theoretical basis of system identification and parameter estimation in a manner that is complete and rigorous, yet understandable with minimal prerequisites. Maximum likelihood and related estimators are highlighted. The approach used requires familiarity with calculus, linear algebra, and probability, but does not require knowledge of stochastic processes or functional analysis. The treatment emphasizes unification of the various areas in estimation in dynamic systems is treated as a direct outgrowth of the static system theory. Topics covered include basic concepts and definitions; numerical optimization methods; probability; statistical estimators; estimation in static systems; stochastic processes; state estimation in dynamic systems; output error, filter error, and equation error methods of parameter estimation in dynamic systems, and the accuracy of the estimates.
A novel description of FDG excretion in the renal system: application to metformin-treated models
NASA Astrophysics Data System (ADS)
Garbarino, S.; Caviglia, G.; Sambuceti, G.; Benvenuto, F.; Piana, M.
2014-05-01
This paper introduces a novel compartmental model describing the excretion of 18F-fluoro-deoxyglucose (FDG) in the renal system and a numerical method based on the maximum likelihood for its reduction. This approach accounts for variations in FDG concentration due to water re-absorption in renal tubules and the increase of the bladder’s volume during the FDG excretion process. From the computational viewpoint, the reconstruction of the tracer kinetic parameters is obtained by solving the maximum likelihood problem iteratively, using a non-stationary, steepest descent approach that explicitly accounts for the Poisson nature of nuclear medicine data. The reliability of the method is validated against two sets of synthetic data realized according to realistic conditions. Finally we applied this model to describe FDG excretion in the case of animal models treated with metformin. In particular we show that our approach allows the quantitative estimation of the reduction of FDG de-phosphorylation induced by metformin.
Liu, Xiang; Peng, Yingwei; Tu, Dongsheng; Liang, Hua
2012-10-30
Survival data with a sizable cure fraction are commonly encountered in cancer research. The semiparametric proportional hazards cure model has been recently used to analyze such data. As seen in the analysis of data from a breast cancer study, a variable selection approach is needed to identify important factors in predicting the cure status and risk of breast cancer recurrence. However, no specific variable selection method for the cure model is available. In this paper, we present a variable selection approach with penalized likelihood for the cure model. The estimation can be implemented easily by combining the computational methods for penalized logistic regression and the penalized Cox proportional hazards models with the expectation-maximization algorithm. We illustrate the proposed approach on data from a breast cancer study. We conducted Monte Carlo simulations to evaluate the performance of the proposed method. We used and compared different penalty functions in the simulation studies. Copyright © 2012 John Wiley & Sons, Ltd.
Markov Chain Monte Carlo: an introduction for epidemiologists
Hamra, Ghassan; MacLehose, Richard; Richardson, David
2013-01-01
Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit. PMID:23569196
Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data
ERIC Educational Resources Information Center
Savalei, Victoria
2010-01-01
Maximum likelihood is the most common estimation method in structural equation modeling. Standard errors for maximum likelihood estimates are obtained from the associated information matrix, which can be estimated from the sample using either expected or observed information. It is known that, with complete data, estimates based on observed or…
A Maximum Likelihood Approach to Functional Mapping of Longitudinal Binary Traits
Wang, Chenguang; Li, Hongying; Wang, Zhong; Wang, Yaqun; Wang, Ningtao; Wang, Zuoheng; Wu, Rongling
2013-01-01
Despite their importance in biology and biomedicine, genetic mapping of binary traits that change over time has not been well explored. In this article, we develop a statistical model for mapping quantitative trait loci (QTLs) that govern longitudinal responses of binary traits. The model is constructed within the maximum likelihood framework by which the association between binary responses is modeled in terms of conditional log odds-ratios. With this parameterization, the maximum likelihood estimates (MLEs) of marginal mean parameters are robust to the misspecification of time dependence. We implement an iterative procedures to obtain the MLEs of QTL genotype-specific parameters that define longitudinal binary responses. The usefulness of the model was validated by analyzing a real example in rice. Simulation studies were performed to investigate the statistical properties of the model, showing that the model has power to identify and map specific QTLs responsible for the temporal pattern of binary traits. PMID:23183762
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1978-01-01
This paper addresses the problem of obtaining numerically maximum-likelihood estimates of the parameters for a mixture of normal distributions. In recent literature, a certain successive-approximations procedure, based on the likelihood equations, was shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, we introduce a general iterative procedure, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. We show that, with probability 1 as the sample size grows large, this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. We also show that the step-size which yields optimal local convergence rates for large samples is determined in a sense by the 'separation' of the component normal densities and is bounded below by a number between 1 and 2.
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1976-01-01
The problem of obtaining numerically maximum likelihood estimates of the parameters for a mixture of normal distributions is addressed. In recent literature, a certain successive approximations procedure, based on the likelihood equations, is shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, a general iterative procedure is introduced, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. With probability 1 as the sample size grows large, it is shown that this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. The step-size which yields optimal local convergence rates for large samples is determined in a sense by the separation of the component normal densities and is bounded below by a number between 1 and 2.
Bootstrap Standard Errors for Maximum Likelihood Ability Estimates When Item Parameters Are Unknown
ERIC Educational Resources Information Center
Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi
2014-01-01
When item parameter estimates are used to estimate the ability parameter in item response models, the standard error (SE) of the ability estimate must be corrected to reflect the error carried over from item calibration. For maximum likelihood (ML) ability estimates, a corrected asymptotic SE is available, but it requires a long test and the…
Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation
NASA Astrophysics Data System (ADS)
Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.
2016-12-01
With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.
NASA Technical Reports Server (NTRS)
Thadani, S. G.
1977-01-01
The Maximum Likelihood Estimation of Signature Transformation (MLEST) algorithm is used to obtain maximum likelihood estimates (MLE) of affine transformation. The algorithm has been evaluated for three sets of data: simulated (training and recognition segment pairs), consecutive-day (data gathered from Landsat images), and geographical-extension (large-area crop inventory experiment) data sets. For each set, MLEST signature extension runs were made to determine MLE values and the affine-transformed training segment signatures were used to classify the recognition segments. The classification results were used to estimate wheat proportions at 0 and 1% threshold values.
Responder analysis without dichotomization.
Zhang, Zhiwei; Chu, Jianxiong; Rahardja, Dewi; Zhang, Hui; Tang, Li
2016-01-01
In clinical trials, it is common practice to categorize subjects as responders and non-responders on the basis of one or more clinical measurements under pre-specified rules. Such a responder analysis is often criticized for the loss of information in dichotomizing one or more continuous or ordinal variables. It is worth noting that a responder analysis can be performed without dichotomization, because the proportion of responders for each treatment can be derived from a model for the original clinical variables (used to define a responder) and estimated by substituting maximum likelihood estimators of model parameters. This model-based approach can be considerably more efficient and more effective for dealing with missing data than the usual approach based on dichotomization. For parameter estimation, the model-based approach generally requires correct specification of the model for the original variables. However, under the sharp null hypothesis, the model-based approach remains unbiased for estimating the treatment difference even if the model is misspecified. We elaborate on these points and illustrate them with a series of simulation studies mimicking a study of Parkinson's disease, which involves longitudinal continuous data in the definition of a responder.
Levin, Gregory P; Emerson, Sarah C; Emerson, Scott S
2014-09-01
Many papers have introduced adaptive clinical trial methods that allow modifications to the sample size based on interim estimates of treatment effect. There has been extensive commentary on type I error control and efficiency considerations, but little research on estimation after an adaptive hypothesis test. We evaluate the reliability and precision of different inferential procedures in the presence of an adaptive design with pre-specified rules for modifying the sampling plan. We extend group sequential orderings of the outcome space based on the stage at stopping, likelihood ratio statistic, and sample mean to the adaptive setting in order to compute median-unbiased point estimates, exact confidence intervals, and P-values uniformly distributed under the null hypothesis. The likelihood ratio ordering is found to average shorter confidence intervals and produce higher probabilities of P-values below important thresholds than alternative approaches. The bias adjusted mean demonstrates the lowest mean squared error among candidate point estimates. A conditional error-based approach in the literature has the benefit of being the only method that accommodates unplanned adaptations. We compare the performance of this and other methods in order to quantify the cost of failing to plan ahead in settings where adaptations could realistically be pre-specified at the design stage. We find the cost to be meaningful for all designs and treatment effects considered, and to be substantial for designs frequently proposed in the literature. © 2014, The International Biometric Society.
Sun, Zhichao; Mukherjee, Bhramar; Estes, Jason P; Vokonas, Pantel S; Park, Sung Kyun
2017-08-15
Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case-control and case-only designs, longitudinal cohort studies that can capture time-varying outcome and exposure information have long been recommended for gene-environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete-case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA)☆
Röck, Alexander W.; Dür, Arne; van Oven, Mannis; Parson, Walther
2013-01-01
The assignment of haplogroups to mitochondrial DNA haplotypes contributes substantial value for quality control, not only in forensic genetics but also in population and medical genetics. The availability of Phylotree, a widely accepted phylogenetic tree of human mitochondrial DNA lineages, led to the development of several (semi-)automated software solutions for haplogrouping. However, currently existing haplogrouping tools only make use of haplogroup-defining mutations, whereas private mutations (beyond the haplogroup level) can be additionally informative allowing for enhanced haplogroup assignment. This is especially relevant in the case of (partial) control region sequences, which are mainly used in forensics. The present study makes three major contributions toward a more reliable, semi-automated estimation of mitochondrial haplogroups. First, a quality-controlled database consisting of 14,990 full mtGenomes downloaded from GenBank was compiled. Together with Phylotree, these mtGenomes serve as a reference database for haplogroup estimates. Second, the concept of fluctuation rates, i.e. a maximum likelihood estimation of the stability of mutations based on 19,171 full control region haplotypes for which raw lane data is available, is presented. Finally, an algorithm for estimating the haplogroup of an mtDNA sequence based on the combined database of full mtGenomes and Phylotree, which also incorporates the empirically determined fluctuation rates, is brought forward. On the basis of examples from the literature and EMPOP, the algorithm is not only validated, but both the strength of this approach and its utility for quality control of mitochondrial haplotypes is also demonstrated. PMID:23948335
NASA Astrophysics Data System (ADS)
Iskandar, Ismed; Satria Gondokaryono, Yudi
2016-02-01
In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range between the true value and the maximum likelihood estimated value lines.
Model averaging techniques for quantifying conceptual model uncertainty.
Singh, Abhishek; Mishra, Srikanta; Ruskauff, Greg
2010-01-01
In recent years a growing understanding has emerged regarding the need to expand the modeling paradigm to include conceptual model uncertainty for groundwater models. Conceptual model uncertainty is typically addressed by formulating alternative model conceptualizations and assessing their relative likelihoods using statistical model averaging approaches. Several model averaging techniques and likelihood measures have been proposed in the recent literature for this purpose with two broad categories--Monte Carlo-based techniques such as Generalized Likelihood Uncertainty Estimation or GLUE (Beven and Binley 1992) and criterion-based techniques that use metrics such as the Bayesian and Kashyap Information Criteria (e.g., the Maximum Likelihood Bayesian Model Averaging or MLBMA approach proposed by Neuman 2003) and Akaike Information Criterion-based model averaging (AICMA) (Poeter and Anderson 2005). These different techniques can often lead to significantly different relative model weights and ranks because of differences in the underlying statistical assumptions about the nature of model uncertainty. This paper provides a comparative assessment of the four model averaging techniques (GLUE, MLBMA with KIC, MLBMA with BIC, and AIC-based model averaging) mentioned above for the purpose of quantifying the impacts of model uncertainty on groundwater model predictions. Pros and cons of each model averaging technique are examined from a practitioner's perspective using two groundwater modeling case studies. Recommendations are provided regarding the use of these techniques in groundwater modeling practice.
ERIC Educational Resources Information Center
Jones, Douglas H.
The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…
NASA Astrophysics Data System (ADS)
Maghsoudi, Mastoureh; Bakar, Shaiful Anuar Abu
2017-05-01
In this paper, a recent novel approach is applied to estimate the threshold parameter of a composite model. Several composite models from Transformed Gamma and Inverse Transformed Gamma families are constructed based on this approach and their parameters are estimated by the maximum likelihood method. These composite models are fitted to allocated loss adjustment expenses (ALAE). In comparison to all composite models studied, the composite Weibull-Inverse Transformed Gamma model is proved to be a competitor candidate as it best fit the loss data. The final part considers the backtesting method to verify the validation of VaR and CTE risk measures.
NASA Astrophysics Data System (ADS)
Krestyannikov, E.; Tohka, J.; Ruotsalainen, U.
2008-06-01
This paper presents a novel statistical approach for joint estimation of regions-of-interest (ROIs) and the corresponding time-activity curves (TACs) from dynamic positron emission tomography (PET) brain projection data. It is based on optimizing the joint objective function that consists of a data log-likelihood term and two penalty terms reflecting the available a priori information about the human brain anatomy. The developed local optimization strategy iteratively updates both the ROI and TAC parameters and is guaranteed to monotonically increase the objective function. The quantitative evaluation of the algorithm is performed with numerically and Monte Carlo-simulated dynamic PET brain data of the 11C-Raclopride and 18F-FDG tracers. The results demonstrate that the method outperforms the existing sequential ROI quantification approaches in terms of accuracy, and can noticeably reduce the errors in TACs arising due to the finite spatial resolution and ROI delineation.
Computation of nonparametric convex hazard estimators via profile methods.
Jankowski, Hanna K; Wellner, Jon A
2009-05-01
This paper proposes a profile likelihood algorithm to compute the nonparametric maximum likelihood estimator of a convex hazard function. The maximisation is performed in two steps: First the support reduction algorithm is used to maximise the likelihood over all hazard functions with a given point of minimum (or antimode). Then it is shown that the profile (or partially maximised) likelihood is quasi-concave as a function of the antimode, so that a bisection algorithm can be applied to find the maximum of the profile likelihood, and hence also the global maximum. The new algorithm is illustrated using both artificial and real data, including lifetime data for Canadian males and females.
ERIC Educational Resources Information Center
Penfield, Randall D.; Bergeron, Jennifer M.
2005-01-01
This article applies a weighted maximum likelihood (WML) latent trait estimator to the generalized partial credit model (GPCM). The relevant equations required to obtain the WML estimator using the Newton-Raphson algorithm are presented, and a simulation study is described that compared the properties of the WML estimator to those of the maximum…
Rate of convergence of k-step Newton estimators to efficient likelihood estimators
Steve Verrill
2007-01-01
We make use of Cramer conditions together with the well-known local quadratic convergence of Newton?s method to establish the asymptotic closeness of k-step Newton estimators to efficient likelihood estimators. In Verrill and Johnson [2007. Confidence bounds and hypothesis tests for normal distribution coefficients of variation. USDA Forest Products Laboratory Research...
Wlan-Based Indoor Localization Using Neural Networks
NASA Astrophysics Data System (ADS)
Saleem, Fasiha; Wyne, Shurjeel
2016-07-01
Wireless indoor localization has generated recent research interest due to its numerous applications. This work investigates Wi-Fi based indoor localization using two variants of the fingerprinting approach. Specifically, we study the application of an artificial neural network (ANN) for implementing the fingerprinting approach and compare its localization performance with a probabilistic fingerprinting method that is based on maximum likelihood estimation (MLE) of the user location. We incorporate spatial correlation of fading into our investigations, which is often neglected in simulation studies and leads to erroneous location estimates. The localization performance is quantified in terms of accuracy, precision, robustness, and complexity. Multiple methods for handling the case of missing APs in online stage are investigated. Our results indicate that ANN-based fingerprinting outperforms the probabilistic approach for all performance metrics considered in this work.
The Likelihood of Completing a VET Qualification: A Model-Based Approach. Technical Paper
ERIC Educational Resources Information Center
Mark, Kevin; Karmel, Tom
2010-01-01
This paper estimates vocational education and training (VET) course-completion rates, in order to fill a gap in performance measures for the VET sector. The technique the authors use is to track all VET course enrolments within a three-year window, centred on the year of interest. Then, using an absorbing Markov chain model for a VET course…
Peter H. Wychoff; James S. Clark
2000-01-01
Ecologists and foresters have long noted a link between tree growth rate and mortality, and recent work suggests that i&erspecific differences in low growth tolerauce is a key force shaping forest structure. Little information is available, however, on the growth-mortality relationship for most species. We present three methods for estimating growth-mortality...
Learn-as-you-go acceleration of cosmological parameter estimates
NASA Astrophysics Data System (ADS)
Aslanyan, Grigor; Easther, Richard; Price, Layne C.
2015-09-01
Cosmological analyses can be accelerated by approximating slow calculations using a training set, which is either precomputed or generated dynamically. However, this approach is only safe if the approximations are well understood and controlled. This paper surveys issues associated with the use of machine-learning based emulation strategies for accelerating cosmological parameter estimation. We describe a learn-as-you-go algorithm that is implemented in the Cosmo++ code and (1) trains the emulator while simultaneously estimating posterior probabilities; (2) identifies unreliable estimates, computing the exact numerical likelihoods if necessary; and (3) progressively learns and updates the error model as the calculation progresses. We explicitly describe and model the emulation error and show how this can be propagated into the posterior probabilities. We apply these techniques to the Planck likelihood and the calculation of ΛCDM posterior probabilities. The computation is significantly accelerated without a pre-defined training set and uncertainties in the posterior probabilities are subdominant to statistical fluctuations. We have obtained a speedup factor of 6.5 for Metropolis-Hastings and 3.5 for nested sampling. Finally, we discuss the general requirements for a credible error model and show how to update them on-the-fly.
Nixon, Richard M; Duffy, Stephen W; Fender, Guy R K
2003-09-24
The Anglia Menorrhagia Education Study (AMES) is a randomized controlled trial testing the effectiveness of an education package applied to general practices. Binary data are available from two sources; general practitioner reported referrals to hospital, and referrals to hospital determined by independent audit of the general practices. The former may be regarded as a surrogate for the latter, which is regarded as the true endpoint. Data are only available for the true end point on a sub set of the practices, but there are surrogate data for almost all of the audited practices and for most of the remaining practices. The aim of this paper was to estimate the treatment effect using data from every practice in the study. Where the true endpoint was not available, it was estimated by three approaches, a regression method, multiple imputation and a full likelihood model. Including the surrogate data in the analysis yielded an estimate of the treatment effect which was more precise than an estimate gained from using the true end point data alone. The full likelihood method provides a new imputation tool at the disposal of trials with surrogate data.
Learn-as-you-go acceleration of cosmological parameter estimates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aslanyan, Grigor; Easther, Richard; Price, Layne C., E-mail: g.aslanyan@auckland.ac.nz, E-mail: r.easther@auckland.ac.nz, E-mail: lpri691@aucklanduni.ac.nz
2015-09-01
Cosmological analyses can be accelerated by approximating slow calculations using a training set, which is either precomputed or generated dynamically. However, this approach is only safe if the approximations are well understood and controlled. This paper surveys issues associated with the use of machine-learning based emulation strategies for accelerating cosmological parameter estimation. We describe a learn-as-you-go algorithm that is implemented in the Cosmo++ code and (1) trains the emulator while simultaneously estimating posterior probabilities; (2) identifies unreliable estimates, computing the exact numerical likelihoods if necessary; and (3) progressively learns and updates the error model as the calculation progresses. We explicitlymore » describe and model the emulation error and show how this can be propagated into the posterior probabilities. We apply these techniques to the Planck likelihood and the calculation of ΛCDM posterior probabilities. The computation is significantly accelerated without a pre-defined training set and uncertainties in the posterior probabilities are subdominant to statistical fluctuations. We have obtained a speedup factor of 6.5 for Metropolis-Hastings and 3.5 for nested sampling. Finally, we discuss the general requirements for a credible error model and show how to update them on-the-fly.« less
Can, Seda; van de Schoot, Rens; Hox, Joop
2015-06-01
Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influence of the size of the intraclass correlation coefficient (ICC) and estimation method; maximum likelihood estimation with robust chi-squares and standard errors and Bayesian estimation, on the convergence rate are investigated. The other variables of interest were rate of inadmissible solutions and the relative parameter and standard error bias on the between level. The results showed that inadmissible solutions were obtained when there was between level collinearity and the estimation method was maximum likelihood. In the within level multicollinearity condition, all of the solutions were admissible but the bias values were higher compared with the between level collinearity condition. Bayesian estimation appeared to be robust in obtaining admissible parameters but the relative bias was higher than for maximum likelihood estimation. Finally, as expected, high ICC produced less biased results compared to medium ICC conditions.
NASA Astrophysics Data System (ADS)
Brouwer, Derk H.; van Duuren-Stuurman, Birgit; Berges, Markus; Bard, Delphine; Jankowska, Elzbieta; Moehlmann, Carsten; Pelzer, Johannes; Mark, Dave
2013-11-01
Manufactured nano-objects, agglomerates, and aggregates (NOAA) may have adverse effect on human health, but little is known about occupational risks since actual estimates of exposure are lacking. In a large-scale workplace air-monitoring campaign, 19 enterprises were visited and 120 potential exposure scenarios were measured. A multi-metric exposure assessment approach was followed and a decision logic was developed to afford analysis of all results in concert. The overall evaluation was classified by categories of likelihood of exposure. At task level about 53 % showed increased particle number or surface area concentration compared to "background" level, whereas 72 % of the TEM samples revealed an indication that NOAA were present in the workplace. For 54 out of the 120 task-based exposure scenarios, an overall evaluation could be made based on all parameters of the decision logic. For only 1 exposure scenario (approximately 2 %), the highest level of potential likelihood was assigned, whereas in total in 56 % of the exposure scenarios the overall evaluation revealed the lowest level of likelihood. However, for the remaining 42 % exposure to NOAA could not be excluded.
Dragović, Ivana; Turajlić, Nina; Pilčević, Dejan; Petrović, Bratislav; Radojević, Dragan
2015-01-01
Fuzzy inference systems (FIS) enable automated assessment and reasoning in a logically consistent manner akin to the way in which humans reason. However, since no conventional fuzzy set theory is in the Boolean frame, it is proposed that Boolean consistent fuzzy logic should be used in the evaluation of rules. The main distinction of this approach is that it requires the execution of a set of structural transformations before the actual values can be introduced, which can, in certain cases, lead to different results. While a Boolean consistent FIS could be used for establishing the diagnostic criteria for any given disease, in this paper it is applied for determining the likelihood of peritonitis, as the leading complication of peritoneal dialysis (PD). Given that patients could be located far away from healthcare institutions (as peritoneal dialysis is a form of home dialysis) the proposed Boolean consistent FIS would enable patients to easily estimate the likelihood of them having peritonitis (where a high likelihood would suggest that prompt treatment is indicated), when medical experts are not close at hand. PMID:27069500
Maximum likelihood-based analysis of single-molecule photon arrival trajectories.
Hajdziona, Marta; Molski, Andrzej
2011-02-07
In this work we explore the statistical properties of the maximum likelihood-based analysis of one-color photon arrival trajectories. This approach does not involve binning and, therefore, all of the information contained in an observed photon strajectory is used. We study the accuracy and precision of parameter estimates and the efficiency of the Akaike information criterion and the Bayesian information criterion (BIC) in selecting the true kinetic model. We focus on the low excitation regime where photon trajectories can be modeled as realizations of Markov modulated Poisson processes. The number of observed photons is the key parameter in determining model selection and parameter estimation. For example, the BIC can select the true three-state model from competing two-, three-, and four-state kinetic models even for relatively short trajectories made up of 2 × 10(3) photons. When the intensity levels are well-separated and 10(4) photons are observed, the two-state model parameters can be estimated with about 10% precision and those for a three-state model with about 20% precision.
NASA Astrophysics Data System (ADS)
Silva, F. E. O. E.; Naghettini, M. D. C.; Fernandes, W.
2014-12-01
This paper evaluated the uncertainties associated with the estimation of the parameters of a conceptual rainfall-runoff model, through the use of Bayesian inference techniques by Monte Carlo simulation. The Pará River sub-basin, located in the upper São Francisco river basin, in southeastern Brazil, was selected for developing the studies. In this paper, we used the Rio Grande conceptual hydrologic model (EHR/UFMG, 2001) and the Markov Chain Monte Carlo simulation method named DREAM (VRUGT, 2008a). Two probabilistic models for the residues were analyzed: (i) the classic [Normal likelihood - r ≈ N (0, σ²)]; and (ii) a generalized likelihood (SCHOUPS & VRUGT, 2010), in which it is assumed that the differences between observed and simulated flows are correlated, non-stationary, and distributed as a Skew Exponential Power density. The assumptions made for both models were checked to ensure that the estimation of uncertainties in the parameters was not biased. The results showed that the Bayesian approach proved to be adequate to the proposed objectives, enabling and reinforcing the importance of assessing the uncertainties associated with hydrological modeling.
Murray, Aja Louise; Booth, Tom; Eisner, Manuel; Obsuth, Ingrid; Ribeaud, Denis
2018-05-22
Whether or not importance should be placed on an all-encompassing general factor of psychopathology (or p factor) in classifying, researching, diagnosing, and treating psychiatric disorders depends (among other issues) on the extent to which comorbidity is symptom-general rather than staying largely within the confines of narrower transdiagnostic factors such as internalizing and externalizing. In this study, we compared three methods of estimating p factor strength. We compared omega hierarchical and explained common variance calculated from confirmatory factor analysis (CFA) bifactor models with maximum likelihood (ML) estimation, from exploratory structural equation modeling/exploratory factor analysis models with a bifactor rotation, and from Bayesian structural equation modeling (BSEM) bifactor models. Our simulation results suggested that BSEM with small variance priors on secondary loadings might be the preferred option. However, CFA with ML also performed well provided secondary loadings were modeled. We provide two empirical examples of applying the three methodologies using a normative sample of youth (z-proso, n = 1,286) and a university counseling sample (n = 359).
MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...
ERIC Educational Resources Information Center
Casabianca, Jodi M.; Lewis, Charles
2015-01-01
Loglinear smoothing (LLS) estimates the latent trait distribution while making fewer assumptions about its form and maintaining parsimony, thus leading to more precise item response theory (IRT) item parameter estimates than standard marginal maximum likelihood (MML). This article provides the expectation-maximization algorithm for MML estimation…
An EM Algorithm for Maximum Likelihood Estimation of Process Factor Analysis Models
ERIC Educational Resources Information Center
Lee, Taehun
2010-01-01
In this dissertation, an Expectation-Maximization (EM) algorithm is developed and implemented to obtain maximum likelihood estimates of the parameters and the associated standard error estimates characterizing temporal flows for the latent variable time series following stationary vector ARMA processes, as well as the parameters defining the…
Model-based tomographic reconstruction of objects containing known components.
Stayman, J Webster; Otake, Yoshito; Prince, Jerry L; Khanna, A Jay; Siewerdsen, Jeffrey H
2012-10-01
The likelihood of finding manufactured components (surgical tools, implants, etc.) within a tomographic field-of-view has been steadily increasing. One reason is the aging population and proliferation of prosthetic devices, such that more people undergoing diagnostic imaging have existing implants, particularly hip and knee implants. Another reason is that use of intraoperative imaging (e.g., cone-beam CT) for surgical guidance is increasing, wherein surgical tools and devices such as screws and plates are placed within or near to the target anatomy. When these components contain metal, the reconstructed volumes are likely to contain severe artifacts that adversely affect the image quality in tissues both near and far from the component. Because physical models of such components exist, there is a unique opportunity to integrate this knowledge into the reconstruction algorithm to reduce these artifacts. We present a model-based penalized-likelihood estimation approach that explicitly incorporates known information about component geometry and composition. The approach uses an alternating maximization method that jointly estimates the anatomy and the position and pose of each of the known components. We demonstrate that the proposed method can produce nearly artifact-free images even near the boundary of a metal implant in simulated vertebral pedicle screw reconstructions and even under conditions of substantial photon starvation. The simultaneous estimation of device pose also provides quantitative information on device placement that could be valuable to quality assurance and verification of treatment delivery.
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
The effect of prenatal care on birthweight: a full-information maximum likelihood approach.
Rous, Jeffrey J; Jewell, R Todd; Brown, Robert W
2004-03-01
This paper uses a full-information maximum likelihood estimation procedure, the Discrete Factor Method, to estimate the relationship between birthweight and prenatal care. This technique controls for the potential biases surrounding both the sample selection of the pregnancy-resolution decision and the endogeneity of prenatal care. In addition, we use the actual number of prenatal care visits; other studies have normally measured prenatal care as the month care is initiated. We estimate a birthweight production function using 1993 data from the US state of Texas. The results underscore the importance of correcting for estimation problems. Specifically, a model that does not control for sample selection and endogeneity overestimates the benefit of an additional visit for women who have relatively few visits. This overestimation may indicate 'positive fetal selection,' i.e., women who did not abort may have healthier babies. Also, a model that does not control for self-selection and endogenity predicts that past 17 visits, an additional visit leads to lower birthweight, while a model that corrects for these estimation problems predicts a positive effect for additional visits. This result shows the effect of mothers with less healthy fetuses making more prenatal care visits, known as 'adverse selection' in prenatal care. Copyright 2003 John Wiley & Sons, Ltd.
Application of change-point problem to the detection of plant patches.
López, I; Gámez, M; Garay, J; Standovár, T; Varga, Z
2010-03-01
In ecology, if the considered area or space is large, the spatial distribution of individuals of a given plant species is never homogeneous; plants form different patches. The homogeneity change in space or in time (in particular, the related change-point problem) is an important research subject in mathematical statistics. In the paper, for a given data system along a straight line, two areas are considered, where the data of each area come from different discrete distributions, with unknown parameters. In the paper a method is presented for the estimation of the distribution change-point between both areas and an estimate is given for the distributions separated by the obtained change-point. The solution of this problem will be based on the maximum likelihood method. Furthermore, based on an adaptation of the well-known bootstrap resampling, a method for the estimation of the so-called change-interval is also given. The latter approach is very general, since it not only applies in the case of the maximum-likelihood estimation of the change-point, but it can be also used starting from any other change-point estimation known in the ecological literature. The proposed model is validated against typical ecological situations, providing at the same time a verification of the applied algorithms.
2013-01-01
Background Falls among the elderly are a major public health concern. Therefore, the possibility of a modeling technique which could better estimate fall probability is both timely and needed. Using biomedical, pharmacological and demographic variables as predictors, latent class analysis (LCA) is demonstrated as a tool for the prediction of falls among community dwelling elderly. Methods Using a retrospective data-set a two-step LCA modeling approach was employed. First, we looked for the optimal number of latent classes for the seven medical indicators, along with the patients’ prescription medication and three covariates (age, gender, and number of medications). Second, the appropriate latent class structure, with the covariates, were modeled on the distal outcome (fall/no fall). The default estimator was maximum likelihood with robust standard errors. The Pearson chi-square, likelihood ratio chi-square, BIC, Lo-Mendell-Rubin Adjusted Likelihood Ratio test and the bootstrap likelihood ratio test were used for model comparisons. Results A review of the model fit indices with covariates shows that a six-class solution was preferred. The predictive probability for latent classes ranged from 84% to 97%. Entropy, a measure of classification accuracy, was good at 90%. Specific prescription medications were found to strongly influence group membership. Conclusions In conclusion the LCA method was effective at finding relevant subgroups within a heterogenous at-risk population for falling. This study demonstrated that LCA offers researchers a valuable tool to model medical data. PMID:23705639
Multiple robustness in factorized likelihood models.
Molina, J; Rotnitzky, A; Sued, M; Robins, J M
2017-09-01
We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct.
ERIC Educational Resources Information Center
Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun
2002-01-01
Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)
2010-01-01
Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504
Liu, Dungang; Liu, Regina; Xie, Minge
2014-01-01
Meta-analysis has been widely used to synthesize evidence from multiple studies for common hypotheses or parameters of interest. However, it has not yet been fully developed for incorporating heterogeneous studies, which arise often in applications due to different study designs, populations or outcomes. For heterogeneous studies, the parameter of interest may not be estimable for certain studies, and in such a case, these studies are typically excluded from conventional meta-analysis. The exclusion of part of the studies can lead to a non-negligible loss of information. This paper introduces a metaanalysis for heterogeneous studies by combining the confidence density functions derived from the summary statistics of individual studies, hence referred to as the CD approach. It includes all the studies in the analysis and makes use of all information, direct as well as indirect. Under a general likelihood inference framework, this new approach is shown to have several desirable properties, including: i) it is asymptotically as efficient as the maximum likelihood approach using individual participant data (IPD) from all studies; ii) unlike the IPD analysis, it suffices to use summary statistics to carry out the CD approach. Individual-level data are not required; and iii) it is robust against misspecification of the working covariance structure of the parameter estimates. Besides its own theoretical significance, the last property also substantially broadens the applicability of the CD approach. All the properties of the CD approach are further confirmed by data simulated from a randomized clinical trials setting as well as by real data on aircraft landing performance. Overall, one obtains an unifying approach for combining summary statistics, subsuming many of the existing meta-analysis methods as special cases. PMID:26190875
Improving estimates of genetic maps: a meta-analysis-based approach.
Stewart, William C L
2007-07-01
Inaccurate genetic (or linkage) maps can reduce the power to detect linkage, increase type I error, and distort haplotype and relationship inference. To improve the accuracy of existing maps, I propose a meta-analysis-based method that combines independent map estimates into a single estimate of the linkage map. The method uses the variance of each independent map estimate to combine them efficiently, whether the map estimates use the same set of markers or not. As compared with a joint analysis of the pooled genotype data, the proposed method is attractive for three reasons: (1) it has comparable efficiency to the maximum likelihood map estimate when the pooled data are homogeneous; (2) relative to existing map estimation methods, it can have increased efficiency when the pooled data are heterogeneous; and (3) it avoids the practical difficulties of pooling human subjects data. On the basis of simulated data modeled after two real data sets, the proposed method can reduce the sampling variation of linkage maps commonly used in whole-genome linkage scans. Furthermore, when the independent map estimates are also maximum likelihood estimates, the proposed method performs as well as or better than when they are estimated by the program CRIMAP. Since variance estimates of maps may not always be available, I demonstrate the feasibility of three different variance estimators. Overall, the method should prove useful to investigators who need map positions for markers not contained in publicly available maps, and to those who wish to minimize the negative effects of inaccurate maps. Copyright 2007 Wiley-Liss, Inc.
Cure rate model with interval censored data.
Kim, Yang-Jin; Jhun, Myoungshic
2008-01-15
In cancer trials, a significant fraction of patients can be cured, that is, the disease is completely eliminated, so that it never recurs. In general, treatments are developed to both increase the patients' chances of being cured and prolong the survival time among non-cured patients. A cure rate model represents a combination of cure fraction and survival model, and can be applied to many clinical studies over several types of cancer. In this article, the cure rate model is considered in the interval censored data composed of two time points, which include the event time of interest. Interval censored data commonly occur in the studies of diseases that often progress without symptoms, requiring clinical evaluation for detection (Encyclopedia of Biostatistics. Wiley: New York, 1998; 2090-2095). In our study, an approximate likelihood approach suggested by Goetghebeur and Ryan (Biometrics 2000; 56:1139-1144) is used to derive the likelihood in interval censored data. In addition, a frailty model is introduced to characterize the association between the cure fraction and survival model. In particular, the positive association between the cure fraction and the survival time is incorporated by imposing a common normal frailty effect. The EM algorithm is used to estimate parameters and a multiple imputation based on the profile likelihood is adopted for variance estimation. The approach is applied to the smoking cessation study in which the event of interest is a smoking relapse and several covariates including an intensive care treatment are evaluated to be effective for both the occurrence of relapse and the non-smoking duration. Copyright (c) 2007 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Outcome-Dependent Sampling with Interval-Censored Failure Time Data
Zhou, Qingning; Cai, Jianwen; Zhou, Haibo
2017-01-01
Summary Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Alternative cost-effective sampling designs and inference procedures are therefore desirable. We propose an outcome-dependent sampling (ODS) design with interval-censored failure time data, where we enrich the observed sample by selectively including certain more informative failure subjects. We develop a novel sieve semiparametric maximum empirical likelihood approach for fitting the proportional hazards model to data from the proposed interval-censoring ODS design. This approach employs the empirical likelihood and sieve methods to deal with the infinite-dimensional nuisance parameters, which greatly reduces the dimensionality of the estimation problem and eases the computation difficulty. The consistency and asymptotic normality of the resulting regression parameter estimator are established. The results from our extensive simulation study show that the proposed design and method works well for practical situations and is more efficient than the alternative designs and competing approaches. An example from the Atherosclerosis Risk in Communities (ARIC) study is provided for illustration. PMID:28771664
Heading Estimation for Pedestrian Dead Reckoning Based on Robust Adaptive Kalman Filtering.
Wu, Dongjin; Xia, Linyuan; Geng, Jijun
2018-06-19
Pedestrian dead reckoning (PDR) using smart phone-embedded micro-electro-mechanical system (MEMS) sensors plays a key role in ubiquitous localization indoors and outdoors. However, as a relative localization method, it suffers from the problem of error accumulation which prevents it from long term independent running. Heading estimation error is one of the main location error sources, and therefore, in order to improve the location tracking performance of the PDR method in complex environments, an approach based on robust adaptive Kalman filtering (RAKF) for estimating accurate headings is proposed. In our approach, outputs from gyroscope, accelerometer, and magnetometer sensors are fused using the solution of Kalman filtering (KF) that the heading measurements derived from accelerations and magnetic field data are used to correct the states integrated from angular rates. In order to identify and control measurement outliers, a maximum likelihood-type estimator (M-estimator)-based model is used. Moreover, an adaptive factor is applied to resist the negative effects of state model disturbances. Extensive experiments under static and dynamic conditions were conducted in indoor environments. The experimental results demonstrate the proposed approach provides more accurate heading estimates and supports more robust and dynamic adaptive location tracking, compared with methods based on conventional KF.
Estimation of rank correlation for clustered data.
Rosner, Bernard; Glynn, Robert J
2017-06-30
It is well known that the sample correlation coefficient (R xy ) is the maximum likelihood estimator of the Pearson correlation (ρ xy ) for independent and identically distributed (i.i.d.) bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the maximum likelihood estimator of ρ xy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U_ of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (i) converting ranks of both X and Y to the probit scale, (ii) estimating the Pearson correlation between probit scores for X and Y, and (iii) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Is the ML Chi-Square Ever Robust to Nonnormality? A Cautionary Note with Missing Data
ERIC Educational Resources Information Center
Savalei, Victoria
2008-01-01
Normal theory maximum likelihood (ML) is by far the most popular estimation and testing method used in structural equation modeling (SEM), and it is the default in most SEM programs. Even though this approach assumes multivariate normality of the data, its use can be justified on the grounds that it is fairly robust to the violations of the…
Non-Metric Similarity Measures
2015-03-26
Sunil Aryal and Kai Ming Ting. (2015) A generic ensemble approach to estimate multi-dimensional likelihood in Bayesian classifier learning...Computational Intelligence. http://onlinelibrary.wiley.com/doi/10.1111/coin.12063/abstract 5.2 List of peer-reviewed conference publications [3] Sunil Aryal...International Conference on Data Mining. 707-711. [4] Sunil Aryal, Kai Ming Ting, Jonathan R. Wells and Takashi Washio. (2014) Improv- ing iForest with
Approximate median regression for complex survey data with skewed response.
Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi
2016-12-01
The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.
Histogram equalization with Bayesian estimation for noise robust speech recognition.
Suh, Youngjoo; Kim, Hoirin
2018-02-01
The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.
Approximate Median Regression for Complex Survey Data with Skewed Response
Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi
2016-01-01
Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562
Condition Number Regularized Covariance Estimation*
Won, Joong-Ho; Lim, Johan; Kim, Seung-Jean; Rajaratnam, Bala
2012-01-01
Estimation of high-dimensional covariance matrices is known to be a difficult problem, has many applications, and is of current interest to the larger statistics community. In many applications including so-called the “large p small n” setting, the estimate of the covariance matrix is required to be not only invertible, but also well-conditioned. Although many regularization schemes attempt to do this, none of them address the ill-conditioning problem directly. In this paper, we propose a maximum likelihood approach, with the direct goal of obtaining a well-conditioned estimator. No sparsity assumption on either the covariance matrix or its inverse are are imposed, thus making our procedure more widely applicable. We demonstrate that the proposed regularization scheme is computationally efficient, yields a type of Steinian shrinkage estimator, and has a natural Bayesian interpretation. We investigate the theoretical properties of the regularized covariance estimator comprehensively, including its regularization path, and proceed to develop an approach that adaptively determines the level of regularization that is required. Finally, we demonstrate the performance of the regularized estimator in decision-theoretic comparisons and in the financial portfolio optimization setting. The proposed approach has desirable properties, and can serve as a competitive procedure, especially when the sample size is small and when a well-conditioned estimator is required. PMID:23730197
Condition Number Regularized Covariance Estimation.
Won, Joong-Ho; Lim, Johan; Kim, Seung-Jean; Rajaratnam, Bala
2013-06-01
Estimation of high-dimensional covariance matrices is known to be a difficult problem, has many applications, and is of current interest to the larger statistics community. In many applications including so-called the "large p small n " setting, the estimate of the covariance matrix is required to be not only invertible, but also well-conditioned. Although many regularization schemes attempt to do this, none of them address the ill-conditioning problem directly. In this paper, we propose a maximum likelihood approach, with the direct goal of obtaining a well-conditioned estimator. No sparsity assumption on either the covariance matrix or its inverse are are imposed, thus making our procedure more widely applicable. We demonstrate that the proposed regularization scheme is computationally efficient, yields a type of Steinian shrinkage estimator, and has a natural Bayesian interpretation. We investigate the theoretical properties of the regularized covariance estimator comprehensively, including its regularization path, and proceed to develop an approach that adaptively determines the level of regularization that is required. Finally, we demonstrate the performance of the regularized estimator in decision-theoretic comparisons and in the financial portfolio optimization setting. The proposed approach has desirable properties, and can serve as a competitive procedure, especially when the sample size is small and when a well-conditioned estimator is required.
NASA Astrophysics Data System (ADS)
Balbi, S.; Villa, F.; Mojtahed, V.; Hegetschweiler, K. T.; Giupponi, C.
2015-10-01
This article presents a novel methodology to assess flood risk to people by integrating people's vulnerability and ability to cushion hazards through coping and adapting. The proposed approach extends traditional risk assessments beyond material damages; complements quantitative and semi-quantitative data with subjective and local knowledge, improving the use of commonly available information; produces estimates of model uncertainty by providing probability distributions for all of its outputs. Flood risk to people is modeled using a spatially explicit Bayesian network model calibrated on expert opinion. Risk is assessed in terms of: (1) likelihood of non-fatal physical injury; (2) likelihood of post-traumatic stress disorder; (3) likelihood of death. The study area covers the lower part of the Sihl valley (Switzerland) including the city of Zurich. The model is used to estimate the benefits of improving an existing Early Warning System, taking into account the reliability, lead-time and scope (i.e. coverage of people reached by the warning). Model results indicate that the potential benefits of an improved early warning in terms of avoided human impacts are particularly relevant in case of a major flood event: about 75 % of fatalities, 25 % of injuries and 18 % of post-traumatic stress disorders could be avoided.
Explaining the effect of event valence on unrealistic optimism.
Gold, Ron S; Brown, Mark G
2009-05-01
People typically exhibit 'unrealistic optimism' (UO): they believe they have a lower chance of experiencing negative events and a higher chance of experiencing positive events than does the average person. UO has been found to be greater for negative than positive events. This 'valence effect' has been explained in terms of motivational processes. An alternative explanation is provided by the 'numerosity model', which views the valence effect simply as a by-product of a tendency for likelihood estimates pertaining to the average member of a group to increase with the size of the group. Predictions made by the numerosity model were tested in two studies. In each, UO for a single event was assessed. In Study 1 (n = 115 students), valence was manipulated by framing the event either negatively or positively, and participants estimated their own likelihood and that of the average student at their university. In Study 2 (n = 139 students), valence was again manipulated and participants again estimated their own likelihood; additionally, group size was manipulated by having participants estimate the likelihood of the average student in a small, medium-sized, or large group. In each study, the valence effect was found, but was due to an effect on estimates of own likelihood, not the average person's likelihood. In Study 2, valence did not interact with group size. The findings contradict the numerosity model, but are in accord with the motivational explanation. Implications for health education are discussed.
Li, Haocheng; Zhang, Yukun; Carroll, Raymond J; Keadle, Sarah Kozey; Sampson, Joshua N; Matthews, Charles E
2017-11-10
A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi-likelihood type approximation for nonlinear variables and transform the proposed model into a multivariate linear mixed model framework for estimation and inference. Via an extension to the EM approach, an efficient algorithm is developed to fit the model. The method is applied to physical activity data, which uses a wearable accelerometer device to measure daily movement and energy expenditure information. Our approach is also evaluated by a simulation study. Copyright © 2017 John Wiley & Sons, Ltd.
Elashoff, Robert M.; Li, Gang; Li, Ning
2009-01-01
Summary In this article we study a joint model for longitudinal measurements and competing risks survival data. Our joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout. It is also an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk. Our model consists of a linear mixed effects submodel for the longitudinal outcome and a proportional cause-specific hazards frailty submodel (Prentice et al., 1978, Biometrics 34, 541-554) for the competing risks survival data, linked together by some latent random effects. We propose to obtain the maximum likelihood estimates of the parameters by an expectation maximization (EM) algorithm and estimate their standard errors using a profile likelihood method. The developed method works well in our simulation studies and is applied to a clinical trial for the scleroderma lung disease. PMID:18162112
Estimating Interaction Effects With Incomplete Predictor Variables
Enders, Craig K.; Baraldi, Amanda N.; Cham, Heining
2014-01-01
The existing missing data literature does not provide a clear prescription for estimating interaction effects with missing data, particularly when the interaction involves a pair of continuous variables. In this article, we describe maximum likelihood and multiple imputation procedures for this common analysis problem. We outline 3 latent variable model specifications for interaction analyses with missing data. These models apply procedures from the latent variable interaction literature to analyses with a single indicator per construct (e.g., a regression analysis with scale scores). We also discuss multiple imputation for interaction effects, emphasizing an approach that applies standard imputation procedures to the product of 2 raw score predictors. We thoroughly describe the process of probing interaction effects with maximum likelihood and multiple imputation. For both missing data handling techniques, we outline centering and transformation strategies that researchers can implement in popular software packages, and we use a series of real data analyses to illustrate these methods. Finally, we use computer simulations to evaluate the performance of the proposed techniques. PMID:24707955
Overweight and obesity in India: policy issues from an exploratory multi-level analysis.
Siddiqui, Md Zakaria; Donato, Ronald
2016-06-01
This article analyses a nationally representative household dataset-the National Family Health Survey (NFHS-3) conducted in 2005 to 2006-to examine factors influencing the prevalence of overweight/obesity in India. The dataset was disaggregated into four sub-population groups-urban and rural females and males-and multi-level logit regression models were used to estimate the impact of particular covariates on the likelihood of overweight/obesity. The multi-level modelling approach aimed to identify individual and macro-level contextual factors influencing this health outcome. In contrast to most studies on low-income developing countries, the findings reveal that education for females beyond a particular level of educational attainment exhibits a negative relationship with the likelihood of overweight/obesity. This relationship was not observed for males. Muslim females and all Sikh sub-populations have a higher likelihood of overweight/obesity suggesting the importance of socio-cultural influences. The results also show that the relationship between wealth and the probability of overweight/obesity is stronger for males than females highlighting the differential impact of increasing socio-economic status on gender. Multi-level analysis reveals that states exerted an independent influence on the likelihood of overweight/obesity beyond individual-level covariates, reflecting the importance of spatially related contextual factors on overweight/obesity. While this study does not disentangle macro-level 'obesogenic' environmental factors from socio-cultural network influences, the results highlight the need to refrain from adopting a 'one size fits all' policy approach in addressing the overweight/obesity epidemic facing India. Instead, policy implementation requires a more nuanced and targeted approach to incorporate the growing recognition of socio-cultural and spatial contextual factors impacting on healthy behaviours. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Cash, W.
1979-01-01
Many problems in the experimental estimation of parameters for models can be solved through use of the likelihood ratio test. Applications of the likelihood ratio, with particular attention to photon counting experiments, are discussed. The procedures presented solve a greater range of problems than those currently in use, yet are no more difficult to apply. The procedures are proved analytically, and examples from current problems in astronomy are discussed.
van de Schoot, Rens; Broere, Joris J.; Perryck, Koen H.; Zondervan-Zwijnenburg, Mariëlle; van Loey, Nancy E.
2015-01-01
Background The analysis of small data sets in longitudinal studies can lead to power issues and often suffers from biased parameter values. These issues can be solved by using Bayesian estimation in conjunction with informative prior distributions. By means of a simulation study and an empirical example concerning posttraumatic stress symptoms (PTSS) following mechanical ventilation in burn survivors, we demonstrate the advantages and potential pitfalls of using Bayesian estimation. Methods First, we show how to specify prior distributions and by means of a sensitivity analysis we demonstrate how to check the exact influence of the prior (mis-) specification. Thereafter, we show by means of a simulation the situations in which the Bayesian approach outperforms the default, maximum likelihood and approach. Finally, we re-analyze empirical data on burn survivors which provided preliminary evidence of an aversive influence of a period of mechanical ventilation on the course of PTSS following burns. Results Not suprisingly, maximum likelihood estimation showed insufficient coverage as well as power with very small samples. Only when Bayesian analysis, in conjunction with informative priors, was used power increased to acceptable levels. As expected, we showed that the smaller the sample size the more the results rely on the prior specification. Conclusion We show that two issues often encountered during analysis of small samples, power and biased parameters, can be solved by including prior information into Bayesian analysis. We argue that the use of informative priors should always be reported together with a sensitivity analysis. PMID:25765534
van de Schoot, Rens; Broere, Joris J; Perryck, Koen H; Zondervan-Zwijnenburg, Mariëlle; van Loey, Nancy E
2015-01-01
Background : The analysis of small data sets in longitudinal studies can lead to power issues and often suffers from biased parameter values. These issues can be solved by using Bayesian estimation in conjunction with informative prior distributions. By means of a simulation study and an empirical example concerning posttraumatic stress symptoms (PTSS) following mechanical ventilation in burn survivors, we demonstrate the advantages and potential pitfalls of using Bayesian estimation. Methods : First, we show how to specify prior distributions and by means of a sensitivity analysis we demonstrate how to check the exact influence of the prior (mis-) specification. Thereafter, we show by means of a simulation the situations in which the Bayesian approach outperforms the default, maximum likelihood and approach. Finally, we re-analyze empirical data on burn survivors which provided preliminary evidence of an aversive influence of a period of mechanical ventilation on the course of PTSS following burns. Results : Not suprisingly, maximum likelihood estimation showed insufficient coverage as well as power with very small samples. Only when Bayesian analysis, in conjunction with informative priors, was used power increased to acceptable levels. As expected, we showed that the smaller the sample size the more the results rely on the prior specification. Conclusion : We show that two issues often encountered during analysis of small samples, power and biased parameters, can be solved by including prior information into Bayesian analysis. We argue that the use of informative priors should always be reported together with a sensitivity analysis.
NASA Astrophysics Data System (ADS)
Lin, Pei-Sheng; Rosset, Denis; Zhang, Yanbao; Bancal, Jean-Daniel; Liang, Yeong-Cherng
2018-03-01
The device-independent approach to physics is one where conclusions are drawn directly from the observed correlations between measurement outcomes. In quantum information, this approach allows one to make strong statements about the properties of the underlying systems or devices solely via the observation of Bell-inequality-violating correlations. However, since one can only perform a finite number of experimental trials, statistical fluctuations necessarily accompany any estimation of these correlations. Consequently, an important gap remains between the many theoretical tools developed for the asymptotic scenario and the experimentally obtained raw data. In particular, a physical and concurrently practical way to estimate the underlying quantum distribution has so far remained elusive. Here, we show that the natural analogs of the maximum-likelihood estimation technique and the least-square-error estimation technique in the device-independent context result in point estimates of the true distribution that are physical, unique, computationally tractable, and consistent. They thus serve as sound algorithmic tools allowing one to bridge the aforementioned gap. As an application, we demonstrate how such estimates of the underlying quantum distribution can be used to provide, in certain cases, trustworthy estimates of the amount of entanglement present in the measured system. In stark contrast to existing approaches to device-independent parameter estimations, our estimation does not require the prior knowledge of any Bell inequality tailored for the specific property and the specific distribution of interest.
Shahin, Arwa; Smulders, Marinus J. M.; van Tuyl, Jaap M.; Arens, Paul; Bakker, Freek T.
2014-01-01
Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data), RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences) and Consensus Network (constructing a network summarizing among gene tree conflicts). Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium. PMID:25368628
Cosmic shear measurement with maximum likelihood and maximum a posteriori inference
NASA Astrophysics Data System (ADS)
Hall, Alex; Taylor, Andy
2017-06-01
We investigate the problem of noise bias in maximum likelihood and maximum a posteriori estimators for cosmic shear. We derive the leading and next-to-leading order biases and compute them in the context of galaxy ellipticity measurements, extending previous work on maximum likelihood inference for weak lensing. We show that a large part of the bias on these point estimators can be removed using information already contained in the likelihood when a galaxy model is specified, without the need for external calibration. We test these bias-corrected estimators on simulated galaxy images similar to those expected from planned space-based weak lensing surveys, with promising results. We find that the introduction of an intrinsic shape prior can help with mitigation of noise bias, such that the maximum a posteriori estimate can be made less biased than the maximum likelihood estimate. Second-order terms offer a check on the convergence of the estimators, but are largely subdominant. We show how biases propagate to shear estimates, demonstrating in our simple set-up that shear biases can be reduced by orders of magnitude and potentially to within the requirements of planned space-based surveys at mild signal-to-noise ratio. We find that second-order terms can exhibit significant cancellations at low signal-to-noise ratio when Gaussian noise is assumed, which has implications for inferring the performance of shear-measurement algorithms from simplified simulations. We discuss the viability of our point estimators as tools for lensing inference, arguing that they allow for the robust measurement of ellipticity and shear.
Estimation of gross land-use change and its uncertainty using a Bayesian data assimilation approach
NASA Astrophysics Data System (ADS)
Levy, Peter; van Oijen, Marcel; Buys, Gwen; Tomlinson, Sam
2018-03-01
We present a method for estimating land-use change using a Bayesian data assimilation approach. The approach provides a general framework for combining multiple disparate data sources with a simple model. This allows us to constrain estimates of gross land-use change with reliable national-scale census data, whilst retaining the detailed information available from several other sources. Eight different data sources, with three different data structures, were combined in our posterior estimate of land use and land-use change, and other data sources could easily be added in future. The tendency for observations to underestimate gross land-use change is accounted for by allowing for a skewed distribution in the likelihood function. The data structure produced has high temporal and spatial resolution, and is appropriate for dynamic process-based modelling. Uncertainty is propagated appropriately into the output, so we have a full posterior distribution of output and parameters. The data are available in the widely used netCDF file format from http://eidc.ceh.ac.uk/.
ERIC Educational Resources Information Center
Molenaar, Peter C. M.; Nesselroade, John R.
1998-01-01
Pseudo-Maximum Likelihood (p-ML) and Asymptotically Distribution Free (ADF) estimation methods for estimating dynamic factor model parameters within a covariance structure framework were compared through a Monte Carlo simulation. Both methods appear to give consistent model parameter estimates, but only ADF gives standard errors and chi-square…
Cosmological parameters from a re-analysis of the WMAP 7 year low-resolution maps
NASA Astrophysics Data System (ADS)
Finelli, F.; De Rosa, A.; Gruppuso, A.; Paoletti, D.
2013-06-01
Cosmological parameters from Wilkinson Microwave Anisotropy Probe (WMAP) 7 year data are re-analysed by substituting a pixel-based likelihood estimator to the one delivered publicly by the WMAP team. Our pixel-based estimator handles exactly intensity and polarization in a joint manner, allowing us to use low-resolution maps and noise covariance matrices in T, Q, U at the same resolution, which in this work is 3.6°. We describe the features and the performances of the code implementing our pixel-based likelihood estimator. We perform a battery of tests on the application of our pixel-based likelihood routine to WMAP publicly available low-resolution foreground-cleaned products, in combination with the WMAP high-ℓ likelihood, reporting the differences on cosmological parameters evaluated by the full WMAP likelihood public package. The differences are not only due to the treatment of polarization, but also to the marginalization over monopole and dipole uncertainties present in the WMAP pixel likelihood code for temperature. The credible central value for the cosmological parameters change below the 1σ level with respect to the evaluation by the full WMAP 7 year likelihood code, with the largest difference in a shift to smaller values of the scalar spectral index nS.
MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES
Su, Yu-Ru; Wang, Jane-Ling
2018-01-01
There is a surge in medical follow-up studies that include longitudinal covariates in the modeling of survival data. So far, the focus has been largely on right censored survival data. We consider survival data that are subject to both left truncation and right censoring. Left truncation is well known to produce biased sample. The sampling bias issue has been resolved in the literature for the case which involves baseline or time-varying covariates that are observable. The problem remains open however for the important case where longitudinal covariates are present in survival models. A joint likelihood approach has been shown in the literature to provide an effective way to overcome those difficulties for right censored data, but this approach faces substantial additional challenges in the presence of left truncation. Here we thus propose an alternative likelihood to overcome these difficulties and show that the regression coefficient in the survival component can be estimated unbiasedly and efficiently. Issues about the bias for the longitudinal component are discussed. The new approach is illustrated numerically through simulations and data from a multi-center AIDS cohort study. PMID:29479122
Hybrid pairwise likelihood analysis of animal behavior experiments.
Cattelan, Manuela; Varin, Cristiano
2013-12-01
The study of the determinants of fights between animals is an important issue in understanding animal behavior. For this purpose, tournament experiments among a set of animals are often used by zoologists. The results of these tournament experiments are naturally analyzed by paired comparison models. Proper statistical analysis of these models is complicated by the presence of dependence between the outcomes of fights because the same animal is involved in different contests. This paper discusses two different model specifications to account for between-fights dependence. Models are fitted through the hybrid pairwise likelihood method that iterates between optimal estimating equations for the regression parameters and pairwise likelihood inference for the association parameters. This approach requires the specification of means and covariances only. For this reason, the method can be applied also when the computation of the joint distribution is difficult or inconvenient. The proposed methodology is investigated by simulation studies and applied to real data about adult male Cape Dwarf Chameleons. © 2013, The International Biometric Society.
Equivalence of truncated count mixture distributions and mixtures of truncated count distributions.
Böhning, Dankmar; Kuhnert, Ronny
2006-12-01
This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.
NASA Technical Reports Server (NTRS)
Grove, R. D.; Bowles, R. L.; Mayhew, S. C.
1972-01-01
A maximum likelihood parameter estimation procedure and program were developed for the extraction of the stability and control derivatives of aircraft from flight test data. Nonlinear six-degree-of-freedom equations describing aircraft dynamics were used to derive sensitivity equations for quasilinearization. The maximum likelihood function with quasilinearization was used to derive the parameter change equations, the covariance matrices for the parameters and measurement noise, and the performance index function. The maximum likelihood estimator was mechanized into an iterative estimation procedure utilizing a real time digital computer and graphic display system. This program was developed for 8 measured state variables and 40 parameters. Test cases were conducted with simulated data for validation of the estimation procedure and program. The program was applied to a V/STOL tilt wing aircraft, a military fighter airplane, and a light single engine airplane. The particular nonlinear equations of motion, derivation of the sensitivity equations, addition of accelerations into the algorithm, operational features of the real time digital system, and test cases are described.
Empirical likelihood method for non-ignorable missing data problems.
Guan, Zhong; Qin, Jing
2017-01-01
Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.
NASA Astrophysics Data System (ADS)
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
2017-06-01
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
Aerodynamic parameter estimation via Fourier modulating function techniques
NASA Technical Reports Server (NTRS)
Pearson, A. E.
1995-01-01
Parameter estimation algorithms are developed in the frequency domain for systems modeled by input/output ordinary differential equations. The approach is based on Shinbrot's method of moment functionals utilizing Fourier based modulating functions. Assuming white measurement noises for linear multivariable system models, an adaptive weighted least squares algorithm is developed which approximates a maximum likelihood estimate and cannot be biased by unknown initial or boundary conditions in the data owing to a special property attending Shinbrot-type modulating functions. Application is made to perturbation equation modeling of the longitudinal and lateral dynamics of a high performance aircraft using flight-test data. Comparative studies are included which demonstrate potential advantages of the algorithm relative to some well established techniques for parameter identification. Deterministic least squares extensions of the approach are made to the frequency transfer function identification problem for linear systems and to the parameter identification problem for a class of nonlinear-time-varying differential system models.
Coggins, L.G.; Pine, William E.; Walters, C.J.; Martell, S.J.D.
2006-01-01
We present a new model to estimate capture probabilities, survival, abundance, and recruitment using traditional Jolly-Seber capture-recapture methods within a standard fisheries virtual population analysis framework. This approach compares the numbers of marked and unmarked fish at age captured in each year of sampling with predictions based on estimated vulnerabilities and abundance in a likelihood function. Recruitment to the earliest age at which fish can be tagged is estimated by using a virtual population analysis method to back-calculate the expected numbers of unmarked fish at risk of capture. By using information from both marked and unmarked animals in a standard fisheries age structure framework, this approach is well suited to the sparse data situations common in long-term capture-recapture programs with variable sampling effort. ?? Copyright by the American Fisheries Society 2006.
Some Small Sample Results for Maximum Likelihood Estimation in Multidimensional Scaling.
ERIC Educational Resources Information Center
Ramsay, J. O.
1980-01-01
Some aspects of the small sample behavior of maximum likelihood estimates in multidimensional scaling are investigated with Monte Carlo techniques. In particular, the chi square test for dimensionality is examined and a correction for bias is proposed and evaluated. (Author/JKS)
A spatially explicit capture-recapture estimator for single-catch traps.
Distiller, Greg; Borchers, David L
2015-11-01
Single-catch traps are frequently used in live-trapping studies of small mammals. Thus far, a likelihood for single-catch traps has proven elusive and usually the likelihood for multicatch traps is used for spatially explicit capture-recapture (SECR) analyses of such data. Previous work found the multicatch likelihood to provide a robust estimator of average density. We build on a recently developed continuous-time model for SECR to derive a likelihood for single-catch traps. We use this to develop an estimator based on observed capture times and compare its performance by simulation to that of the multicatch estimator for various scenarios with nonconstant density surfaces. While the multicatch estimator is found to be a surprisingly robust estimator of average density, its performance deteriorates with high trap saturation and increasing density gradients. Moreover, it is found to be a poor estimator of the height of the detection function. By contrast, the single-catch estimators of density, distribution, and detection function parameters are found to be unbiased or nearly unbiased in all scenarios considered. This gain comes at the cost of higher variance. If there is no interest in interpreting the detection function parameters themselves, and if density is expected to be fairly constant over the survey region, then the multicatch estimator performs well with single-catch traps. However if accurate estimation of the detection function is of interest, or if density is expected to vary substantially in space, then there is merit in using the single-catch estimator when trap saturation is above about 60%. The estimator's performance is improved if care is taken to place traps so as to span the range of variables that affect animal distribution. As a single-catch likelihood with unknown capture times remains intractable for now, researchers using single-catch traps should aim to incorporate timing devices with their traps.
Advancement of Latent Trait Theory.
1988-02-01
if I am the principal investigator, I find it practically impossible to include and systematize all the important findings and implications within a...methods are described in [1.21. Two important features of the principal investigator’s approach are the following. (1) It does not assume any specific...were described in the preceding chapter, the maximum likelihood estimate 0 of ability 0 , and also f of the transformed ability r play important roles
Important factors in the maximum likelihood analysis of flight test maneuvers
NASA Technical Reports Server (NTRS)
Iliff, K. W.; Maine, R. E.; Montgomery, T. D.
1979-01-01
The information presented is based on the experience in the past 12 years at the NASA Dryden Flight Research Center of estimating stability and control derivatives from over 3500 maneuvers from 32 aircraft. The overall approach to the analysis of dynamic flight test data is outlined. General requirements for data and instrumentation are discussed and several examples of the types of problems that may be encountered are presented.
Bayesian Monte Carlo and Maximum Likelihood Approach for ...
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
Zhan, Tingting; Chevoneva, Inna; Iglewicz, Boris
2010-01-01
The family of weighted likelihood estimators largely overlaps with minimum divergence estimators. They are robust to data contaminations compared to MLE. We define the class of generalized weighted likelihood estimators (GWLE), provide its influence function and discuss the efficiency requirements. We introduce a new truncated cubic-inverse weight, which is both first and second order efficient and more robust than previously reported weights. We also discuss new ways of selecting the smoothing bandwidth and weighted starting values for the iterative algorithm. The advantage of the truncated cubic-inverse weight is illustrated in a simulation study of three-components normal mixtures model with large overlaps and heavy contaminations. A real data example is also provided. PMID:20835375
NASA Astrophysics Data System (ADS)
Samulski, Maurice; Karssemeijer, Nico
2008-03-01
Most of the current CAD systems detect suspicious mass regions independently in single views. In this paper we present a method to match corresponding regions in mediolateral oblique (MLO) and craniocaudal (CC) mammographic views of the breast. For every possible combination of mass regions in the MLO view and CC view, a number of features are computed, such as the difference in distance of a region to the nipple, a texture similarity measure, the gray scale correlation and the likelihood of malignancy of both regions computed by single-view analysis. In previous research, Linear Discriminant Analysis was used to discriminate between correct and incorrect links. In this paper we investigate if the performance can be improved by employing a statistical method in which four classes are distinguished. These four classes are defined by the combinations of view (MLO/CC) and pathology (TP/FP) labels. We use distance-weighted k-Nearest Neighbor density estimation to estimate the likelihood of a region combination. Next, a correspondence score is calculated as the likelihood that the region combination is a TP-TP link. The method was tested on 412 cases with a malignant lesion visible in at least one of the views. In 82.4% of the cases a correct link could be established between the TP detections in both views. In future work, we will use the framework presented here to develop a context dependent region matching scheme, which takes the number and likelihood of possible alternatives into account. It is expected that more accurate determination of matching probabilities will lead to improved CAD performance.
NASA Astrophysics Data System (ADS)
Tichý, Ondřej; Šmídl, Václav; Hofman, Radek; Stohl, Andreas
2016-11-01
Estimation of pollutant releases into the atmosphere is an important problem in the environmental sciences. It is typically formalized as an inverse problem using a linear model that can explain observable quantities (e.g., concentrations or deposition values) as a product of the source-receptor sensitivity (SRS) matrix obtained from an atmospheric transport model multiplied by the unknown source-term vector. Since this problem is typically ill-posed, current state-of-the-art methods are based on regularization of the problem and solution of a formulated optimization problem. This procedure depends on manual settings of uncertainties that are often very poorly quantified, effectively making them tuning parameters. We formulate a probabilistic model, that has the same maximum likelihood solution as the conventional method using pre-specified uncertainties. Replacement of the maximum likelihood solution by full Bayesian estimation also allows estimation of all tuning parameters from the measurements. The estimation procedure is based on the variational Bayes approximation which is evaluated by an iterative algorithm. The resulting method is thus very similar to the conventional approach, but with the possibility to also estimate all tuning parameters from the observations. The proposed algorithm is tested and compared with the standard methods on data from the European Tracer Experiment (ETEX) where advantages of the new method are demonstrated. A MATLAB implementation of the proposed algorithm is available for download.
New estimates of the CMB angular power spectra from the WMAP 5 year low-resolution data
NASA Astrophysics Data System (ADS)
Gruppuso, A.; de Rosa, A.; Cabella, P.; Paci, F.; Finelli, F.; Natoli, P.; de Gasperis, G.; Mandolesi, N.
2009-11-01
A quadratic maximum likelihood (QML) estimator is applied to the Wilkinson Microwave Anisotropy Probe (WMAP) 5 year low-resolution maps to compute the cosmic microwave background angular power spectra (APS) at large scales for both temperature and polarization. Estimates and error bars for the six APS are provided up to l = 32 and compared, when possible, to those obtained by the WMAP team, without finding any inconsistency. The conditional likelihood slices are also computed for the Cl of all the six power spectra from l = 2 to 10 through a pixel-based likelihood code. Both the codes treat the covariance for (T, Q, U) in a single matrix without employing any approximation. The inputs of both the codes (foreground-reduced maps, related covariances and masks) are provided by the WMAP team. The peaks of the likelihood slices are always consistent with the QML estimates within the error bars; however, an excellent agreement occurs when the QML estimates are used as a fiducial power spectrum instead of the best-fitting theoretical power spectrum. By the full computation of the conditional likelihood on the estimated spectra, the value of the temperature quadrupole CTTl=2 is found to be less than 2σ away from the WMAP 5 year Λ cold dark matter best-fitting value. The BB spectrum is found to be well consistent with zero, and upper limits on the B modes are provided. The parity odd signals TB and EB are found to be consistent with zero.
Donato, David I.
2012-01-01
This report presents the mathematical expressions and the computational techniques required to compute maximum-likelihood estimates for the parameters of the National Descriptive Model of Mercury in Fish (NDMMF), a statistical model used to predict the concentration of methylmercury in fish tissue. The expressions and techniques reported here were prepared to support the development of custom software capable of computing NDMMF parameter estimates more quickly and using less computer memory than is currently possible with available general-purpose statistical software. Computation of maximum-likelihood estimates for the NDMMF by numerical solution of a system of simultaneous equations through repeated Newton-Raphson iterations is described. This report explains the derivation of the mathematical expressions required for computational parameter estimation in sufficient detail to facilitate future derivations for any revised versions of the NDMMF that may be developed.
Xu, Xu Steven; Yuan, Min; Yang, Haitao; Feng, Yan; Xu, Jinfeng; Pinheiro, Jose
2017-01-01
Covariate analysis based on population pharmacokinetics (PPK) is used to identify clinically relevant factors. The likelihood ratio test (LRT) based on nonlinear mixed effect model fits is currently recommended for covariate identification, whereas individual empirical Bayesian estimates (EBEs) are considered unreliable due to the presence of shrinkage. The objectives of this research were to investigate the type I error for LRT and EBE approaches, to confirm the similarity of power between the LRT and EBE approaches from a previous report and to explore the influence of shrinkage on LRT and EBE inferences. Using an oral one-compartment PK model with a single covariate impacting on clearance, we conducted a wide range of simulations according to a two-way factorial design. The results revealed that the EBE-based regression not only provided almost identical power for detecting a covariate effect, but also controlled the false positive rate better than the LRT approach. Shrinkage of EBEs is likely not the root cause for decrease in power or inflated false positive rate although the size of the covariate effect tends to be underestimated at high shrinkage. In summary, contrary to the current recommendations, EBEs may be a better choice for statistical tests in PPK covariate analysis compared to LRT. We proposed a three-step covariate modeling approach for population PK analysis to utilize the advantages of EBEs while overcoming their shortcomings, which allows not only markedly reducing the run time for population PK analysis, but also providing more accurate covariate tests.
Maximum Likelihood Estimation of Nonlinear Structural Equation Models.
ERIC Educational Resources Information Center
Lee, Sik-Yum; Zhu, Hong-Tu
2002-01-01
Developed an EM type algorithm for maximum likelihood estimation of a general nonlinear structural equation model in which the E-step is completed by a Metropolis-Hastings algorithm. Illustrated the methodology with results from a simulation study and two real examples using data from previous studies. (SLD)
A New Monte Carlo Method for Estimating Marginal Likelihoods.
Wang, Yu-Bo; Chen, Ming-Hui; Kuo, Lynn; Lewis, Paul O
2018-06-01
Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte Carlo sample. This class can be thought of as a generalization of the harmonic mean and inflated density ratio estimators using a partition weighted kernel (likelihood times prior). We show that our estimator is consistent and has better theoretical properties than the harmonic mean and inflated density ratio estimators. In addition, we provide guidelines on choosing optimal weights. Simulation studies were conducted to examine the empirical performance of the proposed estimator. We further demonstrate the desirable features of the proposed estimator with two real data sets: one is from a prostate cancer study using an ordinal probit regression model with latent variables; the other is for the power prior construction from two Eastern Cooperative Oncology Group phase III clinical trials using the cure rate survival model with similar objectives.
Time series modeling by a regression approach based on a latent process.
Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice
2009-01-01
Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.
Statistical estimation via convex optimization for trending and performance monitoring
NASA Astrophysics Data System (ADS)
Samar, Sikandar
This thesis presents an optimization-based statistical estimation approach to find unknown trends in noisy data. A Bayesian framework is used to explicitly take into account prior information about the trends via trend models and constraints. The main focus is on convex formulation of the Bayesian estimation problem, which allows efficient computation of (globally) optimal estimates. There are two main parts of this thesis. The first part formulates trend estimation in systems described by known detailed models as a convex optimization problem. Statistically optimal estimates are then obtained by maximizing a concave log-likelihood function subject to convex constraints. We consider the problem of increasing problem dimension as more measurements become available, and introduce a moving horizon framework to enable recursive estimation of the unknown trend by solving a fixed size convex optimization problem at each horizon. We also present a distributed estimation framework, based on the dual decomposition method, for a system formed by a network of complex sensors with local (convex) estimation. Two specific applications of the convex optimization-based Bayesian estimation approach are described in the second part of the thesis. Batch estimation for parametric diagnostics in a flight control simulation of a space launch vehicle is shown to detect incipient fault trends despite the natural masking properties of feedback in the guidance and control loops. Moving horizon approach is used to estimate time varying fault parameters in a detailed nonlinear simulation model of an unmanned aerial vehicle. An excellent performance is demonstrated in the presence of winds and turbulence.
Mixture Rasch Models with Joint Maximum Likelihood Estimation
ERIC Educational Resources Information Center
Willse, John T.
2011-01-01
This research provides a demonstration of the utility of mixture Rasch models. Specifically, a model capable of estimating a mixture partial credit model using joint maximum likelihood is presented. Like the partial credit model, the mixture partial credit model has the beneficial feature of being appropriate for analysis of assessment data…
The Effects of Model Misspecification and Sample Size on LISREL Maximum Likelihood Estimates.
ERIC Educational Resources Information Center
Baldwin, Beatrice
The robustness of LISREL computer program maximum likelihood estimates under specific conditions of model misspecification and sample size was examined. The population model used in this study contains one exogenous variable; three endogenous variables; and eight indicator variables, two for each latent variable. Conditions of model…
Maximum likelihood estimates, from censored data, for mixed-Weibull distributions
NASA Astrophysics Data System (ADS)
Jiang, Siyuan; Kececioglu, Dimitri
1992-06-01
A new algorithm for estimating the parameters of mixed-Weibull distributions from censored data is presented. The algorithm follows the principle of maximum likelihood estimate (MLE) through the expectation and maximization (EM) algorithm, and it is derived for both postmortem and nonpostmortem time-to-failure data. It is concluded that the concept of the EM algorithm is easy to understand and apply (only elementary statistics and calculus are required). The log-likelihood function cannot decrease after an EM sequence; this important feature was observed in all of the numerical calculations. The MLEs of the nonpostmortem data were obtained successfully for mixed-Weibull distributions with up to 14 parameters in a 5-subpopulation, mixed-Weibull distribution. Numerical examples indicate that some of the log-likelihood functions of the mixed-Weibull distributions have multiple local maxima; therefore, the algorithm should start at several initial guesses of the parameter set.
Ning, Jing; Chen, Yong; Piao, Jin
2017-07-01
Publication bias occurs when the published research results are systematically unrepresentative of the population of studies that have been conducted, and is a potential threat to meaningful meta-analysis. The Copas selection model provides a flexible framework for correcting estimates and offers considerable insight into the publication bias. However, maximizing the observed likelihood under the Copas selection model is challenging because the observed data contain very little information on the latent variable. In this article, we study a Copas-like selection model and propose an expectation-maximization (EM) algorithm for estimation based on the full likelihood. Empirical simulation studies show that the EM algorithm and its associated inferential procedure performs well and avoids the non-convergence problem when maximizing the observed likelihood. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Probabilistic treatment of the uncertainty from the finite size of weighted Monte Carlo data
NASA Astrophysics Data System (ADS)
Glüsenkamp, Thorsten
2018-06-01
Parameter estimation in HEP experiments often involves Monte Carlo simulation to model the experimental response function. A typical application are forward-folding likelihood analyses with re-weighting, or time-consuming minimization schemes with a new simulation set for each parameter value. Problematically, the finite size of such Monte Carlo samples carries intrinsic uncertainty that can lead to a substantial bias in parameter estimation if it is neglected and the sample size is small. We introduce a probabilistic treatment of this problem by replacing the usual likelihood functions with novel generalized probability distributions that incorporate the finite statistics via suitable marginalization. These new PDFs are analytic, and can be used to replace the Poisson, multinomial, and sample-based unbinned likelihoods, which covers many use cases in high-energy physics. In the limit of infinite statistics, they reduce to the respective standard probability distributions. In the general case of arbitrary Monte Carlo weights, the expressions involve the fourth Lauricella function FD, for which we find a new finite-sum representation in a certain parameter setting. The result also represents an exact form for Carlson's Dirichlet average Rn with n > 0, and thereby an efficient way to calculate the probability generating function of the Dirichlet-multinomial distribution, the extended divided difference of a monomial, or arbitrary moments of univariate B-splines. We demonstrate the bias reduction of our approach with a typical toy Monte Carlo problem, estimating the normalization of a peak in a falling energy spectrum, and compare the results with previously published methods from the literature.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Likelihood-based confidence intervals for estimating floods with given return periods
NASA Astrophysics Data System (ADS)
Martins, Eduardo Sávio P. R.; Clarke, Robin T.
1993-06-01
This paper discusses aspects of the calculation of likelihood-based confidence intervals for T-year floods, with particular reference to (1) the two-parameter gamma distribution; (2) the Gumbel distribution; (3) the two-parameter log-normal distribution, and other distributions related to the normal by Box-Cox transformations. Calculation of the confidence limits is straightforward using the Nelder-Mead algorithm with a constraint incorporated, although care is necessary to ensure convergence either of the Nelder-Mead algorithm, or of the Newton-Raphson calculation of maximum-likelihood estimates. Methods are illustrated using records from 18 gauging stations in the basin of the River Itajai-Acu, State of Santa Catarina, southern Brazil. A small and restricted simulation compared likelihood-based confidence limits with those given by use of the central limit theorem; for the same confidence probability, the confidence limits of the simulation were wider than those of the central limit theorem, which failed more frequently to contain the true quantile being estimated. The paper discusses possible applications of likelihood-based confidence intervals in other areas of hydrological analysis.
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Fischer, H Felix; Rose, Matthias
2016-10-19
Recently, a growing number of Item-Response Theory (IRT) models has been published, which allow estimation of a common latent variable from data derived by different Patient Reported Outcomes (PROs). When using data from different PROs, direct estimation of the latent variable has some advantages over the use of sum score conversion tables. It requires substantial proficiency in the field of psychometrics to fit such models using contemporary IRT software. We developed a web application ( http://www.common-metrics.org ), which allows estimation of latent variable scores more easily using IRT models calibrating different measures on instrument independent scales. Currently, the application allows estimation using six different IRT models for Depression, Anxiety, and Physical Function. Based on published item parameters, users of the application can directly estimate latent trait estimates using expected a posteriori (EAP) for sum scores as well as for specific response patterns, Bayes modal (MAP), Weighted likelihood estimation (WLE) and Maximum likelihood (ML) methods and under three different prior distributions. The obtained estimates can be downloaded and analyzed using standard statistical software. This application enhances the usability of IRT modeling for researchers by allowing comparison of the latent trait estimates over different PROs, such as the Patient Health Questionnaire Depression (PHQ-9) and Anxiety (GAD-7) scales, the Center of Epidemiologic Studies Depression Scale (CES-D), the Beck Depression Inventory (BDI), PROMIS Anxiety and Depression Short Forms and others. Advantages of this approach include comparability of data derived with different measures and tolerance against missing values. The validity of the underlying models needs to be investigated in the future.
Bromaghin, Jeffrey F.; Gates, Kenneth S.; Palmer, Douglas E.
2010-01-01
Many fisheries for Pacific salmon Oncorhynchus spp. are actively managed to meet escapement goal objectives. In fisheries where the demand for surplus production is high, an extensive assessment program is needed to achieve the opposing objectives of allowing adequate escapement and fully exploiting the available surplus. Knowledge of abundance is a critical element of such assessment programs. Abundance estimation using mark—recapture experiments in combination with telemetry has become common in recent years, particularly within Alaskan river systems. Fish are typically captured and marked in the lower river while migrating in aggregations of individuals from multiple populations. Recapture data are obtained using telemetry receivers that are co-located with abundance assessment projects near spawning areas, which provide large sample sizes and information on population-specific mark rates. When recapture data are obtained from multiple populations, unequal mark rates may reflect a violation of the assumption of homogeneous capture probabilities. A common analytical strategy is to test the hypothesis that mark rates are homogeneous and combine all recapture data if the test is not significant. However, mark rates are often low, and a test of homogeneity may lack sufficient power to detect meaningful differences among populations. In addition, differences among mark rates may provide information that could be exploited during parameter estimation. We present a temporally stratified mark—recapture model that permits capture probabilities and migratory timing through the capture area to vary among strata. Abundance information obtained from a subset of populations after the populations have segregated for spawning is jointly modeled with telemetry distribution data by use of a likelihood function. Maximization of the likelihood produces estimates of the abundance and timing of individual populations migrating through the capture area, thus yielding substantially more information than the total abundance estimate provided by the conventional approach. The utility of the model is illustrated with data for coho salmon O. kisutch from the Kasilof River in south-central Alaska.
Katriel, G.; Yaari, R.; Huppert, A.; Roll, U.; Stone, L.
2011-01-01
This paper presents new computational and modelling tools for studying the dynamics of an epidemic in its initial stages that use both available incidence time series and data describing the population's infection network structure. The work is motivated by data collected at the beginning of the H1N1 pandemic outbreak in Israel in the summer of 2009. We formulated a new discrete-time stochastic epidemic SIR (susceptible-infected-recovered) model that explicitly takes into account the disease's specific generation-time distribution and the intrinsic demographic stochasticity inherent to the infection process. Moreover, in contrast with many other modelling approaches, the model allows direct analytical derivation of estimates for the effective reproductive number (Re) and of their credible intervals, by maximum likelihood and Bayesian methods. The basic model can be extended to include age–class structure, and a maximum likelihood methodology allows us to estimate the model's next-generation matrix by combining two types of data: (i) the incidence series of each age group, and (ii) infection network data that provide partial information of ‘who-infected-who’. Unlike other approaches for estimating the next-generation matrix, the method developed here does not require making a priori assumptions about the structure of the next-generation matrix. We show, using a simulation study, that even a relatively small amount of information about the infection network greatly improves the accuracy of estimation of the next-generation matrix. The method is applied in practice to estimate the next-generation matrix from the Israeli H1N1 pandemic data. The tools developed here should be of practical importance for future investigations of epidemics during their initial stages. However, they require the availability of data which represent a random sample of the real epidemic process. We discuss the conditions under which reporting rates may or may not influence our estimated quantities and the effects of bias. PMID:21247949
Ting, Chih-Chung; Yu, Chia-Chen; Maloney, Laurence T.
2015-01-01
In Bayesian decision theory, knowledge about the probabilities of possible outcomes is captured by a prior distribution and a likelihood function. The prior reflects past knowledge and the likelihood summarizes current sensory information. The two combined (integrated) form a posterior distribution that allows estimation of the probability of different possible outcomes. In this study, we investigated the neural mechanisms underlying Bayesian integration using a novel lottery decision task in which both prior knowledge and likelihood information about reward probability were systematically manipulated on a trial-by-trial basis. Consistent with Bayesian integration, as sample size increased, subjects tended to weigh likelihood information more compared with prior information. Using fMRI in humans, we found that the medial prefrontal cortex (mPFC) correlated with the mean of the posterior distribution, a statistic that reflects the integration of prior knowledge and likelihood of reward probability. Subsequent analysis revealed that both prior and likelihood information were represented in mPFC and that the neural representations of prior and likelihood in mPFC reflected changes in the behaviorally estimated weights assigned to these different sources of information in response to changes in the environment. Together, these results establish the role of mPFC in prior-likelihood integration and highlight its involvement in representing and integrating these distinct sources of information. PMID:25632152
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
The problem of estimating label imperfections and the use of the estimation in identifying mislabeled patterns is presented. Expressions for the maximum likelihood estimates of classification errors and a priori probabilities are derived from the classification of a set of labeled patterns. Expressions also are given for the asymptotic variances of probability of correct classification and proportions. Simple models are developed for imperfections in the labels and for classification errors and are used in the formulation of a maximum likelihood estimation scheme. Schemes are presented for the identification of mislabeled patterns in terms of threshold on the discriminant functions for both two-class and multiclass cases. Expressions are derived for the probability that the imperfect label identification scheme will result in a wrong decision and are used in computing thresholds. The results of practical applications of these techniques in the processing of remotely sensed multispectral data are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
INDUSI,J.P.
2003-06-16
Since the events of 9/11, there have been considerable concerns and associated efforts to prevent or respond to acts of terrorism. Very often we hear calls to reduce the threat from or correct vulnerabilities to various terrorist acts. Others fall victim to anxiety over potential scenarios with the gravest of consequences involving hundreds of thousands of casualties. The problem is complicated by the fact that planners have limited, albeit in some cases significant, resources and less than perfect intelligence on potential terrorist plans. However, valuable resources must be used prudently to reduce the overall risk to the nation. A systematicmore » approach to this process of asset allocation is to reduce the overall risk and not just an individual element of risk such as vulnerabilities. Hence, we define risk as a function of three variables: the threat (the likelihood and scenario of the terrorist act), the vulnerability (the vulnerability of potential targets to the threat), and the consequences (health and safety, economic, etc.) resulting from a successful terrorist scenario. Both the vulnerability and consequences from a postulated adversary scenario can be reasonably well estimated. However, the threat likelihood and scenarios are much more difficult to estimate. A possible path forward is to develop scenarios for each potential target in question using experts from many disciplines. This should yield a finite but large number of target-scenario pairs. The vulnerabilities and consequences for each are estimated and then ranked relative to one another. The resulting relative risk ranking will have targets near the top of the ranking for which the threat is estimated to be more likely, the vulnerability greatest, and the consequences the most grave. In the absence of perfect intelligence, this may be the best we can do.« less
Simple Penalties on Maximum-Likelihood Estimates of Genetic Parameters to Reduce Sampling Variation
Meyer, Karin
2016-01-01
Multivariate estimates of genetic parameters are subject to substantial sampling variation, especially for smaller data sets and more than a few traits. A simple modification of standard, maximum-likelihood procedures for multivariate analyses to estimate genetic covariances is described, which can improve estimates by substantially reducing their sampling variances. This is achieved by maximizing the likelihood subject to a penalty. Borrowing from Bayesian principles, we propose a mild, default penalty—derived assuming a Beta distribution of scale-free functions of the covariance components to be estimated—rather than laboriously attempting to determine the stringency of penalization from the data. An extensive simulation study is presented, demonstrating that such penalties can yield very worthwhile reductions in loss, i.e., the difference from population values, for a wide range of scenarios and without distorting estimates of phenotypic covariances. Moreover, mild default penalties tend not to increase loss in difficult cases and, on average, achieve reductions in loss of similar magnitude to computationally demanding schemes to optimize the degree of penalization. Pertinent details required for the adaptation of standard algorithms to locate the maximum of the likelihood function are outlined. PMID:27317681
Devenish Nelson, Eleanor S.; Harris, Stephen; Soulsbury, Carl D.; Richards, Shane A.; Stephens, Philip A.
2010-01-01
Background Demographic models are widely used in conservation and management, and their parameterisation often relies on data collected for other purposes. When underlying data lack clear indications of associated uncertainty, modellers often fail to account for that uncertainty in model outputs, such as estimates of population growth. Methodology/Principal Findings We applied a likelihood approach to infer uncertainty retrospectively from point estimates of vital rates. Combining this with resampling techniques and projection modelling, we show that confidence intervals for population growth estimates are easy to derive. We used similar techniques to examine the effects of sample size on uncertainty. Our approach is illustrated using data on the red fox, Vulpes vulpes, a predator of ecological and cultural importance, and the most widespread extant terrestrial mammal. We show that uncertainty surrounding estimated population growth rates can be high, even for relatively well-studied populations. Halving that uncertainty typically requires a quadrupling of sampling effort. Conclusions/Significance Our results compel caution when comparing demographic trends between populations without accounting for uncertainty. Our methods will be widely applicable to demographic studies of many species. PMID:21049049
Gould, William R.; Kendall, William L.
2013-01-01
Capture-recapture methods were initially developed to estimate human population abundance, but since that time have seen widespread use for fish and wildlife populations to estimate and model various parameters of population, metapopulation, and disease dynamics. Repeated sampling of marked animals provides information for estimating abundance and tracking the fate of individuals in the face of imperfect detection. Mark types have evolved from clipping or tagging to use of noninvasive methods such as photography of natural markings and DNA collection from feces. Survival estimation has been emphasized more recently as have transition probabilities between life history states and/or geographical locations, even where some states are unobservable or uncertain. Sophisticated software has been developed to handle highly parameterized models, including environmental and individual covariates, to conduct model selection, and to employ various estimation approaches such as maximum likelihood and Bayesian approaches. With these user-friendly tools, complex statistical models for studying population dynamics have been made available to ecologists. The future will include a continuing trend toward integrating data types, both for tagged and untagged individuals, to produce more precise and robust population models.
Early Teen Marriage and Future Poverty
DAHL, GORDON B.
2010-01-01
Both early teen marriage and dropping out of high school have historically been associated with a variety of negative outcomes, including higher poverty rates throughout life. Are these negative outcomes due to preexisting differences, or do they represent the causal effect of marriage and schooling choices? To better understand the true personal and societal consequences, in this article, I use an instrumental variables (IV) approach that takes advantage of variation in state laws regulating the age at which individuals are allowed to marry, drop out of school, and begin work. The baseline IV estimate indicates that a woman who marries young is 31 percentage points more likely to live in poverty when she is older. Similarly, a woman who drops out of school is 11 percentage points more likely to be poor. The results are robust to a variety of alternative specifications and estimation methods, including limited information maximum likelihood (LIML) estimation and a control function approach. While grouped ordinary least squares (OLS) estimates for the early teen marriage variable are also large, OLS estimates based on individual-level data are small, consistent with a large amount of measurement error. PMID:20879684
Early teen marriage and future poverty.
Dahl, Gordon B
2010-08-01
Both early teen marriage and dropping out of high school have historically been associated with a variety of negative outcomes, including higher poverty rates throughout life. Are these negative outcomes due to preexisting differences, or do they represent the causal effect of marriage and schooling choices? To better understand the true personal and societal consequences, in this article, I use an instrumental variables (IV) approach that takes advantage of variation in state laws regulating the age at which individuals are allowed to marry, drop out of school, and begin work. The baseline IV estimate indicates that a woman who marries young is 31 percentage points more likely to live in poverty when she is older. Similarly, a woman who drops out of school is 11 percentage points more likely to be poor. The results are robust to a variety of alternative specifications and estimation methods, including limited information maximum likelihood (LIML) estimation and a control function approach. While grouped ordinary least squares (OLS) estimates for the early teen marriage variable are also large, OLS estimates based on individual-level data are small, consistent with a large amount of measurement error
Fushiki, Tadayoshi
2009-07-01
The correlation matrix is a fundamental statistic that is used in many fields. For example, GroupLens, a collaborative filtering system, uses the correlation between users for predictive purposes. Since the correlation is a natural similarity measure between users, the correlation matrix may be used in the Gram matrix in kernel methods. However, the estimated correlation matrix sometimes has a serious defect: although the correlation matrix is originally positive semidefinite, the estimated one may not be positive semidefinite when not all ratings are observed. To obtain a positive semidefinite correlation matrix, the nearest correlation matrix problem has recently been studied in the fields of numerical analysis and optimization. However, statistical properties are not explicitly used in such studies. To obtain a positive semidefinite correlation matrix, we assume the approximate model. By using the model, an estimate is obtained as the optimal point of an optimization problem formulated with information on the variances of the estimated correlation coefficients. The problem is solved by a convex quadratic semidefinite program. A penalized likelihood approach is also examined. The MovieLens data set is used to test our approach.
ERIC Educational Resources Information Center
Beauducel, Andre; Herzberg, Philipp Yorck
2006-01-01
This simulation study compared maximum likelihood (ML) estimation with weighted least squares means and variance adjusted (WLSMV) estimation. The study was based on confirmatory factor analyses with 1, 2, 4, and 8 factors, based on 250, 500, 750, and 1,000 cases, and on 5, 10, 20, and 40 variables with 2, 3, 4, 5, and 6 categories. There was no…
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A.
2012-01-01
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. PMID:23843659
Two new methods to fit models for network meta-analysis with random inconsistency effects.
Law, Martin; Jackson, Dan; Turner, Rebecca; Rhodes, Kirsty; Viechtbauer, Wolfgang
2016-07-28
Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses "ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported.
The Generation of a Stochastic Flood Event Catalogue for Continental USA
NASA Astrophysics Data System (ADS)
Quinn, N.; Wing, O.; Smith, A.; Sampson, C. C.; Neal, J. C.; Bates, P. D.
2017-12-01
Recent advances in the acquisition of spatiotemporal environmental data and improvements in computational capabilities has enabled the generation of large scale, even global, flood hazard layers which serve as a critical decision-making tool for a range of end users. However, these datasets are designed to indicate only the probability and depth of inundation at a given location and are unable to describe the likelihood of concurrent flooding across multiple sites.Recent research has highlighted that although the estimation of large, widespread flood events is of great value to flood mitigation and insurance industries, to date it has been difficult to deal with this spatial dependence structure in flood risk over relatively large scales. Many existing approaches have been restricted to empirical estimates of risk based on historic events, limiting their capability of assessing risk over the full range of plausible scenarios. Therefore, this research utilises a recently developed model-based approach to describe the multisite joint distribution of extreme river flows across continental USA river gauges. Given an extreme event at a site, the model characterises the likelihood neighbouring sites are also impacted. This information is used to simulate an ensemble of plausible synthetic extreme event footprints from which flood depths are extracted from an existing global flood hazard catalogue. Expected economic losses are then estimated by overlaying flood depths with national datasets defining asset locations, characteristics and depth damage functions. The ability of this approach to quantify probabilistic economic risk and rare threshold exceeding events is expected to be of value to those interested in the flood mitigation and insurance sectors.This work describes the methodological steps taken to create the flood loss catalogue over a national scale; highlights the uncertainty in the expected annual economic vulnerability within the USA from extreme river flows; and presents future developments to the modelling approach.
ERIC Educational Resources Information Center
Klein, Andreas G.; Muthen, Bengt O.
2007-01-01
In this article, a nonlinear structural equation model is introduced and a quasi-maximum likelihood method for simultaneous estimation and testing of multiple nonlinear effects is developed. The focus of the new methodology lies on efficiency, robustness, and computational practicability. Monte-Carlo studies indicate that the method is highly…
Likelihood-Based Confidence Intervals in Exploratory Factor Analysis
ERIC Educational Resources Information Center
Oort, Frans J.
2011-01-01
In exploratory or unrestricted factor analysis, all factor loadings are free to be estimated. In oblique solutions, the correlations between common factors are free to be estimated as well. The purpose of this article is to show how likelihood-based confidence intervals can be obtained for rotated factor loadings and factor correlations, by…
Estimation of Complex Generalized Linear Mixed Models for Measurement and Growth
ERIC Educational Resources Information Center
Jeon, Minjeong
2012-01-01
Maximum likelihood (ML) estimation of generalized linear mixed models (GLMMs) is technically challenging because of the intractable likelihoods that involve high dimensional integrations over random effects. The problem is magnified when the random effects have a crossed design and thus the data cannot be reduced to small independent clusters. A…
ERIC Educational Resources Information Center
Adank, Patti
2012-01-01
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
Estimation After a Group Sequential Trial.
Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Kenward, Michael G; Tsiatis, Anastasios A; Davidian, Marie; Verbeke, Geert
2015-10-01
Group sequential trials are one important instance of studies for which the sample size is not fixed a priori but rather takes one of a finite set of pre-specified values, dependent on the observed data. Much work has been devoted to the inferential consequences of this design feature. Molenberghs et al (2012) and Milanzi et al (2012) reviewed and extended the existing literature, focusing on a collection of seemingly disparate, but related, settings, namely completely random sample sizes, group sequential studies with deterministic and random stopping rules, incomplete data, and random cluster sizes. They showed that the ordinary sample average is a viable option for estimation following a group sequential trial, for a wide class of stopping rules and for random outcomes with a distribution in the exponential family. Their results are somewhat surprising in the sense that the sample average is not optimal, and further, there does not exist an optimal, or even, unbiased linear estimator. However, the sample average is asymptotically unbiased, both conditionally upon the observed sample size as well as marginalized over it. By exploiting ignorability they showed that the sample average is the conventional maximum likelihood estimator. They also showed that a conditional maximum likelihood estimator is finite sample unbiased, but is less efficient than the sample average and has the larger mean squared error. Asymptotically, the sample average and the conditional maximum likelihood estimator are equivalent. This previous work is restricted, however, to the situation in which the the random sample size can take only two values, N = n or N = 2 n . In this paper, we consider the more practically useful setting of sample sizes in a the finite set { n 1 , n 2 , …, n L }. It is shown that the sample average is then a justifiable estimator , in the sense that it follows from joint likelihood estimation, and it is consistent and asymptotically unbiased. We also show why simulations can give the false impression of bias in the sample average when considered conditional upon the sample size. The consequence is that no corrections need to be made to estimators following sequential trials. When small-sample bias is of concern, the conditional likelihood estimator provides a relatively straightforward modification to the sample average. Finally, it is shown that classical likelihood-based standard errors and confidence intervals can be applied, obviating the need for technical corrections.
Information matrix estimation procedures for cognitive diagnostic models.
Liu, Yanlou; Xin, Tao; Andersson, Björn; Tian, Wei
2018-03-06
Two new methods to estimate the asymptotic covariance matrix for marginal maximum likelihood estimation of cognitive diagnosis models (CDMs), the inverse of the observed information matrix and the sandwich-type estimator, are introduced. Unlike several previous covariance matrix estimators, the new methods take into account both the item and structural parameters. The relationships between the observed information matrix, the empirical cross-product information matrix, the sandwich-type covariance matrix and the two approaches proposed by de la Torre (2009, J. Educ. Behav. Stat., 34, 115) are discussed. Simulation results show that, for a correctly specified CDM and Q-matrix or with a slightly misspecified probability model, the observed information matrix and the sandwich-type covariance matrix exhibit good performance with respect to providing consistent standard errors of item parameter estimates. However, with substantial model misspecification only the sandwich-type covariance matrix exhibits robust performance. © 2018 The British Psychological Society.
Hutson, Alan D
2018-01-01
In this note, we develop a new and novel semi-parametric estimator of the survival curve that is comparable to the product-limit estimator under very relaxed assumptions. The estimator is based on a beta parametrization that warps the empirical distribution of the observed censored and uncensored data. The parameters are obtained using a pseudo-maximum likelihood approach adjusting the survival curve accounting for the censored observations. In the univariate setting, the new estimator tends to better extend the range of the survival estimation given a high degree of censoring. However, the key feature of this paper is that we develop a new two-group semi-parametric exact permutation test for comparing survival curves that is generally superior to the classic log-rank and Wilcoxon tests and provides the best global power across a variety of alternatives. The new test is readily extended to the k group setting. PMID:26988931
Two models for evaluating landslide hazards
Davis, J.C.; Chung, C.-J.; Ohlmacher, G.C.
2006-01-01
Two alternative procedures for estimating landslide hazards were evaluated using data on topographic digital elevation models (DEMs) and bedrock lithologies in an area adjacent to the Missouri River in Atchison County, Kansas, USA. The two procedures are based on the likelihood ratio model but utilize different assumptions. The empirical likelihood ratio model is based on non-parametric empirical univariate frequency distribution functions under an assumption of conditional independence while the multivariate logistic discriminant model assumes that likelihood ratios can be expressed in terms of logistic functions. The relative hazards of occurrence of landslides were estimated by an empirical likelihood ratio model and by multivariate logistic discriminant analysis. Predictor variables consisted of grids containing topographic elevations, slope angles, and slope aspects calculated from a 30-m DEM. An integer grid of coded bedrock lithologies taken from digitized geologic maps was also used as a predictor variable. Both statistical models yield relative estimates in the form of the proportion of total map area predicted to already contain or to be the site of future landslides. The stabilities of estimates were checked by cross-validation of results from random subsamples, using each of the two procedures. Cell-by-cell comparisons of hazard maps made by the two models show that the two sets of estimates are virtually identical. This suggests that the empirical likelihood ratio and the logistic discriminant analysis models are robust with respect to the conditional independent assumption and the logistic function assumption, respectively, and that either model can be used successfully to evaluate landslide hazards. ?? 2006.
Lopes, J S; Arenas, M; Posada, D; Beaumont, M A
2014-03-01
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.
White, Gary C.; Hines, J.E.
2004-01-01
The reality is that the statistical methods used for analysis of data depend upon the availability of software. Analysis of marked animal data is no different than the rest of the statistical field. The methods used for analysis are those that are available in reliable software packages. Thus, the critical importance of having reliable, up–to–date software available to biologists is obvious. Statisticians have continued to develop more robust models, ever expanding the suite of potential analysis methodsavailable. But without software to implement these newer methods, they will languish in the abstract, and not be applied to the problems deserving them.In the Computers and Software Session, two new software packages are described, a comparison of implementation of methods for the estimation of nest survival is provided, and a more speculative paper about how the next generation of software might be structured is presented.Rotella et al. (2004) compare nest survival estimation with different software packages: SAS logistic regression, SAS non–linear mixed models, and Program MARK. Nests are assumed to be visited at various, possibly infrequent, intervals. All of the approaches described compute nest survival with the same likelihood, and require that the age of the nest is known to account for nests that eventually hatch. However, each approach offers advantages and disadvantages, explored by Rotella et al. (2004).Efford et al. (2004) present a new software package called DENSITY. The package computes population abundance and density from trapping arrays and other detection methods with a new and unique approach. DENSITY represents the first major addition to the analysis of trapping arrays in 20 years.Barker & White (2004) discuss how existing software such as Program MARK require that each new model’s likelihood must be programmed specifically for that model. They wishfully think that future software might allow the user to combine pieces of likelihood functions together to generate estimates. The idea is interesting, and maybe some bright young statistician can work out the specifics to implement the procedure.Choquet et al. (2004) describe MSURGE, a software package that implements the multistate capture–recapture models. The unique feature of MSURGE is that the design matrix is constructed with an interpreted language called GEMACO. Because MSURGE is limited to just multistate models, the special requirements of these likelihoods can be provided.The software and methods presented in these papers gives biologists and wildlife managers an expanding range of possibilities for data analysis. Although ease–of–use is generally getting better, it does not replace the need for understanding of the requirements and structure of the models being computed. The internet provides access to many free software packages as well as user–discussion groups to share knowledge and ideas. (A starting point for wildlife–related applications is (http://www.phidot.org).
Two means of sampling sexual minority women: how different are the samples of women?
Boehmer, Ulrike; Clark, Melissa; Timm, Alison; Ozonoff, Al
2008-01-01
We compared 2 sampling approaches of sexual minority women in 1 limited geographic area to better understand the implications of these 2 sampling approaches. Sexual minority women identified through the Census did not differ on average age or the prevalence of raising children from those sampled using nonrandomized methods. Women in the convenience sample were better educated and lived in smaller households. Modeling the likelihood of disability in this population resulted in contradictory parameter estimates by sampling approach. The degree of variation observed both between sampling approaches and between different parameters suggests that the total population of sexual minority women is still unmeasured. Thoroughly constructed convenience samples will continue to be a useful sampling strategy to further research on this population.
Harbert, Robert S; Nixon, Kevin C
2015-08-01
• Plant distributions have long been understood to be correlated with the environmental conditions to which species are adapted. Climate is one of the major components driving species distributions. Therefore, it is expected that the plants coexisting in a community are reflective of the local environment, particularly climate.• Presented here is a method for the estimation of climate from local plant species coexistence data. The method, Climate Reconstruction Analysis using Coexistence Likelihood Estimation (CRACLE), is a likelihood-based method that employs specimen collection data at a global scale for the inference of species climate tolerance. CRACLE calculates the maximum joint likelihood of coexistence given individual species climate tolerance characterization to estimate the expected climate.• Plant distribution data for more than 4000 species were used to show that this method accurately infers expected climate profiles for 165 sites with diverse climatic conditions. Estimates differ from the WorldClim global climate model by less than 1.5°C on average for mean annual temperature and less than ∼250 mm for mean annual precipitation. This is a significant improvement upon other plant-based climate-proxy methods.• CRACLE validates long hypothesized interactions between climate and local associations of plant species. Furthermore, CRACLE successfully estimates climate that is consistent with the widely used WorldClim model and therefore may be applied to the quantitative estimation of paleoclimate in future studies. © 2015 Botanical Society of America, Inc.
Offline handwritten word recognition using MQDF-HMMs
NASA Astrophysics Data System (ADS)
Ramachandrula, Sitaram; Hambarde, Mangesh; Patial, Ajay; Sahoo, Dushyant; Kochar, Shaivi
2015-01-01
We propose an improved HMM formulation for offline handwriting recognition (HWR). The main contribution of this work is using modified quadratic discriminant function (MQDF) [1] within HMM framework. In an MQDF-HMM the state observation likelihood is calculated by a weighted combination of MQDF likelihoods of individual Gaussians of GMM (Gaussian Mixture Model). The quadratic discriminant function (QDF) of a multivariate Gaussian can be rewritten by avoiding the inverse of covariance matrix by using the Eigen values and Eigen vectors of it. The MQDF is derived from QDF by substituting few of badly estimated lower-most Eigen values by an appropriate constant. The estimation errors of non-dominant Eigen vectors and Eigen values of covariance matrix for which the training data is insufficient can be controlled by this approach. MQDF has been successfully shown to improve the character recognition performance [1]. The usage of MQDF in HMM improves the computation, storage and modeling power of HMM when there is limited training data. We have got encouraging results on offline handwritten character (NIST database) and word recognition in English using MQDF HMMs.
Bivariate categorical data analysis using normal linear conditional multinomial probability model.
Sun, Bingrui; Sutradhar, Brajendra
2015-02-10
Bivariate multinomial data such as the left and right eyes retinopathy status data are analyzed either by using a joint bivariate probability model or by exploiting certain odds ratio-based association models. However, the joint bivariate probability model yields marginal probabilities, which are complicated functions of marginal and association parameters for both variables, and the odds ratio-based association model treats the odds ratios involved in the joint probabilities as 'working' parameters, which are consequently estimated through certain arbitrary 'working' regression models. Also, this later odds ratio-based model does not provide any easy interpretations of the correlations between two categorical variables. On the basis of pre-specified marginal probabilities, in this paper, we develop a bivariate normal type linear conditional multinomial probability model to understand the correlations between two categorical variables. The parameters involved in the model are consistently estimated using the optimal likelihood and generalized quasi-likelihood approaches. The proposed model and the inferences are illustrated through an intensive simulation study as well as an analysis of the well-known Wisconsin Diabetic Retinopathy status data. Copyright © 2014 John Wiley & Sons, Ltd.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.
Zierke, Stephanie; Bakos, Jason D
2010-04-12
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Curiale, Ariel H; Vegas-Sánchez-Ferrero, Gonzalo; Bosch, Johan G; Aja-Fernández, Santiago
2015-08-01
The strain and strain-rate measures are commonly used for the analysis and assessment of regional myocardial function. In echocardiography (EC), the strain analysis became possible using Tissue Doppler Imaging (TDI). Unfortunately, this modality shows an important limitation: the angle between the myocardial movement and the ultrasound beam should be small to provide reliable measures. This constraint makes it difficult to provide strain measures of the entire myocardium. Alternative non-Doppler techniques such as Speckle Tracking (ST) can provide strain measures without angle constraints. However, the spatial resolution and the noisy appearance of speckle still make the strain estimation a challenging task in EC. Several maximum likelihood approaches have been proposed to statistically characterize the behavior of speckle, which results in a better performance of speckle tracking. However, those models do not consider common transformations to achieve the final B-mode image (e.g. interpolation). This paper proposes a new maximum likelihood approach for speckle tracking which effectively characterizes speckle of the final B-mode image. Its formulation provides a diffeomorphic scheme than can be efficiently optimized with a second-order method. The novelty of the method is threefold: First, the statistical characterization of speckle generalizes conventional speckle models (Rayleigh, Nakagami and Gamma) to a more versatile model for real data. Second, the formulation includes local correlation to increase the efficiency of frame-to-frame speckle tracking. Third, a probabilistic myocardial tissue characterization is used to automatically identify more reliable myocardial motions. The accuracy and agreement assessment was evaluated on a set of 16 synthetic image sequences for three different scenarios: normal, acute ischemia and acute dyssynchrony. The proposed method was compared to six speckle tracking methods. Results revealed that the proposed method is the most accurate method to measure the motion and strain with an average median motion error of 0.42 mm and a median strain error of 2.0 ± 0.9%, 2.1 ± 1.3% and 7.1 ± 4.9% for circumferential, longitudinal and radial strain respectively. It also showed its capability to identify abnormal segments with reduced cardiac function and timing differences for the dyssynchrony cases. These results indicate that the proposed diffeomorphic speckle tracking method provides robust and accurate motion and strain estimation. Copyright © 2015. Published by Elsevier B.V.
Maximum-likelihood methods in wavefront sensing: stochastic models and likelihood functions
Barrett, Harrison H.; Dainty, Christopher; Lara, David
2008-01-01
Maximum-likelihood (ML) estimation in wavefront sensing requires careful attention to all noise sources and all factors that influence the sensor data. We present detailed probability density functions for the output of the image detector in a wavefront sensor, conditional not only on wavefront parameters but also on various nuisance parameters. Practical ways of dealing with nuisance parameters are described, and final expressions for likelihoods and Fisher information matrices are derived. The theory is illustrated by discussing Shack–Hartmann sensors, and computational requirements are discussed. Simulation results show that ML estimation can significantly increase the dynamic range of a Shack–Hartmann sensor with four detectors and that it can reduce the residual wavefront error when compared with traditional methods. PMID:17206255
A Parametric k-Means Algorithm
Tarpey, Thaddeus
2007-01-01
Summary The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm. PMID:17917692
NASA Astrophysics Data System (ADS)
Freni, Gabriele; Mannina, Giorgio
In urban drainage modelling, uncertainty analysis is of undoubted necessity. However, uncertainty analysis in urban water-quality modelling is still in its infancy and only few studies have been carried out. Therefore, several methodological aspects still need to be experienced and clarified especially regarding water quality modelling. The use of the Bayesian approach for uncertainty analysis has been stimulated by its rigorous theoretical framework and by the possibility of evaluating the impact of new knowledge on the modelling predictions. Nevertheless, the Bayesian approach relies on some restrictive hypotheses that are not present in less formal methods like the Generalised Likelihood Uncertainty Estimation (GLUE). One crucial point in the application of Bayesian method is the formulation of a likelihood function that is conditioned by the hypotheses made regarding model residuals. Statistical transformations, such as the use of Box-Cox equation, are generally used to ensure the homoscedasticity of residuals. However, this practice may affect the reliability of the analysis leading to a wrong uncertainty estimation. The present paper aims to explore the influence of the Box-Cox equation for environmental water quality models. To this end, five cases were considered one of which was the “real” residuals distributions (i.e. drawn from available data). The analysis was applied to the Nocella experimental catchment (Italy) which is an agricultural and semi-urbanised basin where two sewer systems, two wastewater treatment plants and a river reach were monitored during both dry and wet weather periods. The results show that the uncertainty estimation is greatly affected by residual transformation and a wrong assumption may also affect the evaluation of model uncertainty. The use of less formal methods always provide an overestimation of modelling uncertainty with respect to Bayesian method but such effect is reduced if a wrong assumption is made regarding the residuals distribution. If residuals are not normally distributed, the uncertainty is over-estimated if Box-Cox transformation is not applied or non-calibrated parameter is used.
Alić, Nikola; Papen, George; Saperstein, Robert; Milstein, Laurence; Fainman, Yeshaiahu
2005-06-13
Exact signal statistics for fiber-optic links containing a single optical pre-amplifier are calculated and applied to sequence estimation for electronic dispersion compensation. The performance is evaluated and compared with results based on the approximate chi-square statistics. We show that detection in existing systems based on exact statistics can be improved relative to using a chi-square distribution for realistic filter shapes. In contrast, for high-spectral efficiency systems the difference between the two approaches diminishes, and performance tends to be less dependent on the exact shape of the filter used.
NASA Technical Reports Server (NTRS)
Frehlich, Rod
1993-01-01
Calculations of the exact Cramer-Rao Bound (CRB) for unbiased estimates of the mean frequency, signal power, and spectral width of Doppler radar/lidar signals (a Gaussian random process) are presented. Approximate CRB's are derived using the Discrete Fourier Transform (DFT). These approximate results are equal to the exact CRB when the DFT coefficients are mutually uncorrelated. Previous high SNR limits for CRB's are shown to be inaccurate because the discrete summations cannot be approximated with integration. The performance of an approximate maximum likelihood estimator for mean frequency approaches the exact CRB for moderate signal to noise ratio and moderate spectral width.
Log-normal frailty models fitted as Poisson generalized linear mixed models.
Hirsch, Katharina; Wienke, Andreas; Kuss, Oliver
2016-12-01
The equivalence of a survival model with a piecewise constant baseline hazard function and a Poisson regression model has been known since decades. As shown in recent studies, this equivalence carries over to clustered survival data: A frailty model with a log-normal frailty term can be interpreted and estimated as a generalized linear mixed model with a binary response, a Poisson likelihood, and a specific offset. Proceeding this way, statistical theory and software for generalized linear mixed models are readily available for fitting frailty models. This gain in flexibility comes at the small price of (1) having to fix the number of pieces for the baseline hazard in advance and (2) having to "explode" the data set by the number of pieces. In this paper we extend the simulations of former studies by using a more realistic baseline hazard (Gompertz) and by comparing the model under consideration with competing models. Furthermore, the SAS macro %PCFrailty is introduced to apply the Poisson generalized linear mixed approach to frailty models. The simulations show good results for the shared frailty model. Our new %PCFrailty macro provides proper estimates, especially in case of 4 events per piece. The suggested Poisson generalized linear mixed approach for log-normal frailty models based on the %PCFrailty macro provides several advantages in the analysis of clustered survival data with respect to more flexible modelling of fixed and random effects, exact (in the sense of non-approximate) maximum likelihood estimation, and standard errors and different types of confidence intervals for all variance parameters. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Likelihood-based methods for evaluating principal surrogacy in augmented vaccine trials.
Liu, Wei; Zhang, Bo; Zhang, Hui; Zhang, Zhiwei
2017-04-01
There is growing interest in assessing immune biomarkers, which are quick to measure and potentially predictive of long-term efficacy, as surrogate endpoints in randomized, placebo-controlled vaccine trials. This can be done under a principal stratification approach, with principal strata defined using a subject's potential immune responses to vaccine and placebo (the latter may be assumed to be zero). In this context, principal surrogacy refers to the extent to which vaccine efficacy varies across principal strata. Because a placebo recipient's potential immune response to vaccine is unobserved in a standard vaccine trial, augmented vaccine trials have been proposed to produce the information needed to evaluate principal surrogacy. This article reviews existing methods based on an estimated likelihood and a pseudo-score (PS) and proposes two new methods based on a semiparametric likelihood (SL) and a pseudo-likelihood (PL), for analyzing augmented vaccine trials. Unlike the PS method, the SL method does not require a model for missingness, which can be advantageous when immune response data are missing by happenstance. The SL method is shown to be asymptotically efficient, and it performs similarly to the PS and PL methods in simulation experiments. The PL method appears to have a computational advantage over the PS and SL methods.
ERIC Educational Resources Information Center
Han, Kyung T.; Guo, Fanmin
2014-01-01
The full-information maximum likelihood (FIML) method makes it possible to estimate and analyze structural equation models (SEM) even when data are partially missing, enabling incomplete data to contribute to model estimation. The cornerstone of FIML is the missing-at-random (MAR) assumption. In (unidimensional) computerized adaptive testing…
Constrained Maximum Likelihood Estimation for Two-Level Mean and Covariance Structure Models
ERIC Educational Resources Information Center
Bentler, Peter M.; Liang, Jiajuan; Tang, Man-Lai; Yuan, Ke-Hai
2011-01-01
Maximum likelihood is commonly used for the estimation of model parameters in the analysis of two-level structural equation models. Constraints on model parameters could be encountered in some situations such as equal factor loadings for different factors. Linear constraints are the most common ones and they are relatively easy to handle in…
ERIC Educational Resources Information Center
Kelderman, Henk
1992-01-01
Describes algorithms used in the computer program LOGIMO for obtaining maximum likelihood estimates of the parameters in loglinear models. These algorithms are also useful for the analysis of loglinear item-response theory models. Presents modified versions of the iterative proportional fitting and Newton-Raphson algorithms. Simulated data…
NASA Astrophysics Data System (ADS)
Zhou, Rurui; Li, Yu; Lu, Di; Liu, Haixing; Zhou, Huicheng
2016-09-01
This paper investigates the use of an epsilon-dominance non-dominated sorted genetic algorithm II (ɛ-NSGAII) as a sampling approach with an aim to improving sampling efficiency for multiple metrics uncertainty analysis using Generalized Likelihood Uncertainty Estimation (GLUE). The effectiveness of ɛ-NSGAII based sampling is demonstrated compared with Latin hypercube sampling (LHS) through analyzing sampling efficiency, multiple metrics performance, parameter uncertainty and flood forecasting uncertainty with a case study of flood forecasting uncertainty evaluation based on Xinanjiang model (XAJ) for Qing River reservoir, China. Results obtained demonstrate the following advantages of the ɛ-NSGAII based sampling approach in comparison to LHS: (1) The former performs more effective and efficient than LHS, for example the simulation time required to generate 1000 behavioral parameter sets is shorter by 9 times; (2) The Pareto tradeoffs between metrics are demonstrated clearly with the solutions from ɛ-NSGAII based sampling, also their Pareto optimal values are better than those of LHS, which means better forecasting accuracy of ɛ-NSGAII parameter sets; (3) The parameter posterior distributions from ɛ-NSGAII based sampling are concentrated in the appropriate ranges rather than uniform, which accords with their physical significance, also parameter uncertainties are reduced significantly; (4) The forecasted floods are close to the observations as evaluated by three measures: the normalized total flow outside the uncertainty intervals (FOUI), average relative band-width (RB) and average deviation amplitude (D). The flood forecasting uncertainty is also reduced a lot with ɛ-NSGAII based sampling. This study provides a new sampling approach to improve multiple metrics uncertainty analysis under the framework of GLUE, and could be used to reveal the underlying mechanisms of parameter sets under multiple conflicting metrics in the uncertainty analysis process.
Extending the Applicability of the Generalized Likelihood Function for Zero-Inflated Data Series
NASA Astrophysics Data System (ADS)
Oliveira, Debora Y.; Chaffe, Pedro L. B.; Sá, João. H. M.
2018-03-01
Proper uncertainty estimation for data series with a high proportion of zero and near zero observations has been a challenge in hydrologic studies. This technical note proposes a modification to the Generalized Likelihood function that accounts for zero inflation of the error distribution (ZI-GL). We compare the performance of the proposed ZI-GL with the original Generalized Likelihood function using the entire data series (GL) and by simply suppressing zero observations (GLy>0). These approaches were applied to two interception modeling examples characterized by data series with a significant number of zeros. The ZI-GL produced better uncertainty ranges than the GL as measured by the precision, reliability and volumetric bias metrics. The comparison between ZI-GL and GLy>0 highlights the need for further improvement in the treatment of residuals from near zero simulations when a linear heteroscedastic error model is considered. Aside from the interception modeling examples illustrated herein, the proposed ZI-GL may be useful for other hydrologic studies, such as for the modeling of the runoff generation in hillslopes and ephemeral catchments.
A comparative review of methods for comparing means using partially paired data.
Guo, Beibei; Yuan, Ying
2017-06-01
In medical experiments with the objective of testing the equality of two means, data are often partially paired by design or because of missing data. The partially paired data represent a combination of paired and unpaired observations. In this article, we review and compare nine methods for analyzing partially paired data, including the two-sample t-test, paired t-test, corrected z-test, weighted t-test, pooled t-test, optimal pooled t-test, multiple imputation method, mixed model approach, and the test based on a modified maximum likelihood estimate. We compare the performance of these methods through extensive simulation studies that cover a wide range of scenarios with different effect sizes, sample sizes, and correlations between the paired variables, as well as true underlying distributions. The simulation results suggest that when the sample size is moderate, the test based on the modified maximum likelihood estimator is generally superior to the other approaches when the data is normally distributed and the optimal pooled t-test performs the best when the data is not normally distributed, with well-controlled type I error rates and high statistical power; when the sample size is small, the optimal pooled t-test is to be recommended when both variables have missing data and the paired t-test is to be recommended when only one variable has missing data.
Gartlehner, Gerald; Dobrescu, Andreea; Evans, Tammeka Swinson; Bann, Carla; Robinson, Karen A; Reston, James; Thaler, Kylie; Skelly, Andrea; Glechner, Anna; Peterson, Kimberly; Kien, Christina; Lohr, Kathleen N
2016-02-01
To determine the predictive validity of the U.S. Evidence-based Practice Center (EPC) approach to GRADE (Grading of Recommendations Assessment, Development and Evaluation). Based on Cochrane reports with outcomes graded as high quality of evidence (QOE), we prepared 160 documents which represented different levels of QOE. Professional systematic reviewers dually graded the QOE. For each document, we determined whether estimates were concordant with high QOE estimates of the Cochrane reports. We compared the observed proportion of concordant estimates with the expected proportion from an international survey. To determine the predictive validity, we used the Hosmer-Lemeshow test to assess calibration and the C (concordance) index to assess discrimination. The predictive validity of the EPC approach to GRADE was limited. Estimates graded as high QOE were less likely, estimates graded as low or insufficient QOE more likely to remain stable than expected. The EPC approach to GRADE could not reliably predict the likelihood that individual bodies of evidence remain stable as new evidence becomes available. C-indices ranged between 0.56 (95% CI, 0.47 to 0.66) and 0.58 (95% CI, 0.50 to 0.67) indicating a low discriminatory ability. The limited predictive validity of the EPC approach to GRADE seems to reflect a mismatch between expected and observed changes in treatment effects as bodies of evidence advance from insufficient to high QOE. Copyright © 2016 Elsevier Inc. All rights reserved.
A Molecular Phylogeny of the Chalcidoidea (Hymenoptera)
Munro, James B.; Heraty, John M.; Burks, Roger A.; Hawks, David; Mottern, Jason; Cruaud, Astrid; Rasplus, Jean-Yves; Jansta, Petr
2011-01-01
Chalcidoidea (Hymenoptera) are extremely diverse with more than 23,000 species described and over 500,000 species estimated to exist. This is the first comprehensive phylogenetic analysis of the superfamily based on a molecular analysis of 18S and 28S ribosomal gene regions for 19 families, 72 subfamilies, 343 genera and 649 species. The 56 outgroups are comprised of Ceraphronoidea and most proctotrupomorph families, including Mymarommatidae. Data alignment and the impact of ambiguous regions are explored using a secondary structure analysis and automated (MAFFT) alignments of the core and pairing regions and regions of ambiguous alignment. Both likelihood and parsimony approaches are used to analyze the data. Overall there is no impact of alignment method, and few but substantial differences between likelihood and parsimony approaches. Monophyly of Chalcidoidea and a sister group relationship between Mymaridae and the remaining Chalcidoidea is strongly supported in all analyses. Either Mymarommatoidea or Diaprioidea are the sister group of Chalcidoidea depending on the analysis. Likelihood analyses place Rotoitidae as the sister group of the remaining Chalcidoidea after Mymaridae, whereas parsimony nests them within Chalcidoidea. Some traditional family groups are supported as monophyletic (Agaonidae, Eucharitidae, Encyrtidae, Eulophidae, Leucospidae, Mymaridae, Ormyridae, Signiphoridae, Tanaostigmatidae and Trichogrammatidae). Several other families are paraphyletic (Perilampidae) or polyphyletic (Aphelinidae, Chalcididae, Eupelmidae, Eurytomidae, Pteromalidae, Tetracampidae and Torymidae). Evolutionary scenarios discussed for Chalcidoidea include the evolution of phytophagy, egg parasitism, sternorrhynchan parasitism, hypermetamorphic development and heteronomy. PMID:22087244
Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals.
Engemann, Denis A; Gramfort, Alexandre
2015-03-01
Magnetoencephalography and electroencephalography (M/EEG) measure non-invasively the weak electromagnetic fields induced by post-synaptic neural currents. The estimation of the spatial covariance of the signals recorded on M/EEG sensors is a building block of modern data analysis pipelines. Such covariance estimates are used in brain-computer interfaces (BCI) systems, in nearly all source localization methods for spatial whitening as well as for data covariance estimation in beamformers. The rationale for such models is that the signals can be modeled by a zero mean Gaussian distribution. While maximizing the Gaussian likelihood seems natural, it leads to a covariance estimate known as empirical covariance (EC). It turns out that the EC is a poor estimate of the true covariance when the number of samples is small. To address this issue the estimation needs to be regularized. The most common approach downweights off-diagonal coefficients, while more advanced regularization methods are based on shrinkage techniques or generative models with low rank assumptions: probabilistic PCA (PPCA) and factor analysis (FA). Using cross-validation all of these models can be tuned and compared based on Gaussian likelihood computed on unseen data. We investigated these models on simulations, one electroencephalography (EEG) dataset as well as magnetoencephalography (MEG) datasets from the most common MEG systems. First, our results demonstrate that different models can be the best, depending on the number of samples, heterogeneity of sensor types and noise properties. Second, we show that the models tuned by cross-validation are superior to models with hand-selected regularization. Hence, we propose an automated solution to the often overlooked problem of covariance estimation of M/EEG signals. The relevance of the procedure is demonstrated here for spatial whitening and source localization of MEG signals. Copyright © 2015 Elsevier Inc. All rights reserved.
Lin, Feng-Chang; Zhu, Jun
2012-01-01
We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.
Load estimator (LOADEST): a FORTRAN program for estimating constituent loads in streams and rivers
Runkel, Robert L.; Crawford, Charles G.; Cohn, Timothy A.
2004-01-01
LOAD ESTimator (LOADEST) is a FORTRAN program for estimating constituent loads in streams and rivers. Given a time series of streamflow, additional data variables, and constituent concentration, LOADEST assists the user in developing a regression model for the estimation of constituent load (calibration). Explanatory variables within the regression model include various functions of streamflow, decimal time, and additional user-specified data variables. The formulated regression model then is used to estimate loads over a user-specified time interval (estimation). Mean load estimates, standard errors, and 95 percent confidence intervals are developed on a monthly and(or) seasonal basis. The calibration and estimation procedures within LOADEST are based on three statistical estimation methods. The first two methods, Adjusted Maximum Likelihood Estimation (AMLE) and Maximum Likelihood Estimation (MLE), are appropriate when the calibration model errors (residuals) are normally distributed. Of the two, AMLE is the method of choice when the calibration data set (time series of streamflow, additional data variables, and concentration) contains censored data. The third method, Least Absolute Deviation (LAD), is an alternative to maximum likelihood estimation when the residuals are not normally distributed. LOADEST output includes diagnostic tests and warnings to assist the user in determining the appropriate estimation method and in interpreting the estimated loads. This report describes the development and application of LOADEST. Sections of the report describe estimation theory, input/output specifications, sample applications, and installation instructions.
Maximum Likelihood Shift Estimation Using High Resolution Polarimetric SAR Clutter Model
NASA Astrophysics Data System (ADS)
Harant, Olivier; Bombrun, Lionel; Vasile, Gabriel; Ferro-Famil, Laurent; Gay, Michel
2011-03-01
This paper deals with a Maximum Likelihood (ML) shift estimation method in the context of High Resolution (HR) Polarimetric SAR (PolSAR) clutter. Texture modeling is exposed and the generalized ML texture tracking method is extended to the merging of various sensors. Some results on displacement estimation on the Argentiere glacier in the Mont Blanc massif using dual-pol TerraSAR-X (TSX) and quad-pol RADARSAT-2 (RS2) sensors are finally discussed.
Nonparametric probability density estimation by optimization theoretic techniques
NASA Technical Reports Server (NTRS)
Scott, D. W.
1976-01-01
Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.
Efficient Exploration of the Space of Reconciled Gene Trees
Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent
2013-01-01
Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git. [amalgamation; gene tree reconciliation; gene tree reconstruction; lateral gene transfer; phylogeny.] PMID:23925510
Improvements in Spectrum's fit to program data tool.
Mahiane, Severin G; Marsh, Kimberly; Grantham, Kelsey; Crichlow, Shawna; Caceres, Karen; Stover, John
2017-04-01
The Joint United Nations Program on HIV/AIDS-supported Spectrum software package (Glastonbury, Connecticut, USA) is used by most countries worldwide to monitor the HIV epidemic. In Spectrum, HIV incidence trends among adults (aged 15-49 years) are derived by either fitting to seroprevalence surveillance and survey data or generating curves consistent with program and vital registration data, such as historical trends in the number of newly diagnosed infections or people living with HIV and AIDS related deaths. This article describes development and application of the fit to program data (FPD) tool in Joint United Nations Program on HIV/AIDS' 2016 estimates round. In the FPD tool, HIV incidence trends are described as a simple or double logistic function. Function parameters are estimated from historical program data on newly reported HIV cases, people living with HIV or AIDS-related deaths. Inputs can be adjusted for proportions undiagnosed or misclassified deaths. Maximum likelihood estimation or minimum chi-squared distance methods are used to identify the best fitting curve. Asymptotic properties of the estimators from these fits are used to estimate uncertainty. The FPD tool was used to fit incidence for 62 countries in 2016. Maximum likelihood and minimum chi-squared distance methods gave similar results. A double logistic curve adequately described observed trends in all but four countries where a simple logistic curve performed better. Robust HIV-related program and vital registration data are routinely available in many middle-income and high-income countries, whereas HIV seroprevalence surveillance and survey data may be scarce. In these countries, the FPD tool offers a simpler, improved approach to estimating HIV incidence trends.
NASA Astrophysics Data System (ADS)
Li, Xinya; Deng, Zhiqun Daniel; Rauchenstein, Lynn T.; Carlson, Thomas J.
2016-04-01
Locating the position of fixed or mobile sources (i.e., transmitters) based on measurements obtained from sensors (i.e., receivers) is an important research area that is attracting much interest. In this paper, we review several representative localization algorithms that use time of arrivals (TOAs) and time difference of arrivals (TDOAs) to achieve high signal source position estimation accuracy when a transmitter is in the line-of-sight of a receiver. Circular (TOA) and hyperbolic (TDOA) position estimation approaches both use nonlinear equations that relate the known locations of receivers and unknown locations of transmitters. Estimation of the location of transmitters using the standard nonlinear equations may not be very accurate because of receiver location errors, receiver measurement errors, and computational efficiency challenges that result in high computational burdens. Least squares and maximum likelihood based algorithms have become the most popular computational approaches to transmitter location estimation. In this paper, we summarize the computational characteristics and position estimation accuracies of various positioning algorithms. By improving methods for estimating the time-of-arrival of transmissions at receivers and transmitter location estimation algorithms, transmitter location estimation may be applied across a range of applications and technologies such as radar, sonar, the Global Positioning System, wireless sensor networks, underwater animal tracking, mobile communications, and multimedia.
Mixed model approaches for diallel analysis based on a bio-model.
Zhu, J; Weir, B S
1996-12-01
A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.
Network Model-Assisted Inference from Respondent-Driven Sampling Data
Gile, Krista J.; Handcock, Mark S.
2015-01-01
Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328
Network Model-Assisted Inference from Respondent-Driven Sampling Data.
Gile, Krista J; Handcock, Mark S
2015-06-01
Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beer, M.
1980-12-01
The maximum likelihood method for the multivariate normal distribution is applied to the case of several individual eigenvalues. Correlated Monte Carlo estimates of the eigenvalue are assumed to follow this prescription and aspects of the assumption are examined. Monte Carlo cell calculations using the SAM-CE and VIM codes for the TRX-1 and TRX-2 benchmark reactors, and SAM-CE full core results are analyzed with this method. Variance reductions of a few percent to a factor of 2 are obtained from maximum likelihood estimation as compared with the simple average and the minimum variance individual eigenvalue. The numerical results verify that themore » use of sample variances and correlation coefficients in place of the corresponding population statistics still leads to nearly minimum variance estimation for a sufficient number of histories and aggregates.« less
Cosmological parameter estimation using Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Prasad, J.; Souradeep, T.
2014-03-01
Constraining parameters of a theoretical model from observational data is an important exercise in cosmology. There are many theoretically motivated models, which demand greater number of cosmological parameters than the standard model of cosmology uses, and make the problem of parameter estimation challenging. It is a common practice to employ Bayesian formalism for parameter estimation for which, in general, likelihood surface is probed. For the standard cosmological model with six parameters, likelihood surface is quite smooth and does not have local maxima, and sampling based methods like Markov Chain Monte Carlo (MCMC) method are quite successful. However, when there are a large number of parameters or the likelihood surface is not smooth, other methods may be more effective. In this paper, we have demonstrated application of another method inspired from artificial intelligence, called Particle Swarm Optimization (PSO) for estimating cosmological parameters from Cosmic Microwave Background (CMB) data taken from the WMAP satellite.
Estimating river discharge uncertainty by applying the Rating Curve Model
NASA Astrophysics Data System (ADS)
Barbetta, S.; Melone, F.; Franchini, M.; Moramarco, T.
2012-04-01
The knowledge of the flow discharge at a river site is necessary for planning and management of water resources as well as for monitoring and real-time forecasting purposes when significant flood events occur. In the hydrological practice, the operational discharge measurement in medium and large rivers is mostly based on indirect approaches by converting the observed stage into discharge values using steady-flow rating curves. However, the stage-discharge relationship can be unknown for hydrometric sections where flow velocity measurements, particularly during high floods, are not available. To overcome this issue, a simplified approach named Rating Curve Model (RCM) and proposed by Moramarco et al. (Moramarco, T., Barbetta, S., F. Melone, F. & Singh, V.P., Relating local stage and remote discharge with significant lateral inflow, J. Hydrol. Engng ASCE, 10[1], 58?69, 2005) can be conveniently used. RCM turned out able to assess, with a high level of accuracy, the discharge hydrograph at a river site where only the stage is monitored while the flow is recorded at a different section along the river, even when significant lateral flows occur. The simple structure of the model is depending on three parameters of which two can be considered characteristic of the river reach and one of the wave travel time of floods. Considering that RCM well lends itself to predict the stage-discharge relationship at a river site wherein only stages are recorded, an uncertainty analysis on river discharge estimate is of interest for the hydrological practice definitely. To this aim, the uncertainty characterizing the RCM outcomes is addressed in this work by considering two different procedures based on the Monte Carlo approach and the Generalized Likelihood Uncertainty Estimation (GLUE) method, respectively. The statistical distribution of parameters is found and a random re-sampling of parameters is done for assessing the 90% confidence interval (CI) of discharge estimates. In particular, for the latter approach the Nash-Sutcliffe coefficient is used as likelihood measure. Two equipped river reaches of the Upper-Middle Tiber River basin, central Italy, are investigated as case studies. The results provided by the selected methodologies are discussed and compared showing that all the computed CIs are satisfied in term of percentage of included observed discharges with similar percentages characterizing the bands assessed by both Monte Carlo approach and GLUE procedure.
Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.
2016-01-01
This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.
The Extended-Image Tracking Technique Based on the Maximum Likelihood Estimation
NASA Technical Reports Server (NTRS)
Tsou, Haiping; Yan, Tsun-Yee
2000-01-01
This paper describes an extended-image tracking technique based on the maximum likelihood estimation. The target image is assume to have a known profile covering more than one element of a focal plane detector array. It is assumed that the relative position between the imager and the target is changing with time and the received target image has each of its pixels disturbed by an independent additive white Gaussian noise. When a rotation-invariant movement between imager and target is considered, the maximum likelihood based image tracking technique described in this paper is a closed-loop structure capable of providing iterative update of the movement estimate by calculating the loop feedback signals from a weighted correlation between the currently received target image and the previously estimated reference image in the transform domain. The movement estimate is then used to direct the imager to closely follow the moving target. This image tracking technique has many potential applications, including free-space optical communications and astronomy where accurate and stabilized optical pointing is essential.
Reyes-Valdés, M H; Stelly, D M
1995-01-01
Frequencies of meiotic configurations in cytogenetic stocks are dependent on chiasma frequencies in segments defined by centromeres, breakpoints, and telomeres. The expectation maximization algorithm is proposed as a general method to perform maximum likelihood estimations of the chiasma frequencies in the intervals between such locations. The estimates can be translated via mapping functions into genetic maps of cytogenetic landmarks. One set of observational data was analyzed to exemplify application of these methods, results of which were largely concordant with other comparable data. The method was also tested by Monte Carlo simulation of frequencies of meiotic configurations from a monotelodisomic translocation heterozygote, assuming six different sample sizes. The estimate averages were always close to the values given initially to the parameters. The maximum likelihood estimation procedures can be extended readily to other kinds of cytogenetic stocks and allow the pooling of diverse cytogenetic data to collectively estimate lengths of segments, arms, and chromosomes. Images Fig. 1 PMID:7568226
Bellier, Edwige; Grøtan, Vidar; Engen, Steinar; Schartau, Ann Kristin; Diserud, Ola H; Finstad, Anders G
2012-10-01
Obtaining accurate estimates of diversity indices is difficult because the number of species encountered in a sample increases with sampling intensity. We introduce a novel method that requires that the presence of species in a sample to be assessed while the counts of the number of individuals per species are only required for just a small part of the sample. To account for species included as incidence data in the species abundance distribution, we modify the likelihood function of the classical Poisson log-normal distribution. Using simulated community assemblages, we contrast diversity estimates based on a community sample, a subsample randomly extracted from the community sample, and a mixture sample where incidence data are added to a subsample. We show that the mixture sampling approach provides more accurate estimates than the subsample and at little extra cost. Diversity indices estimated from a freshwater zooplankton community sampled using the mixture approach show the same pattern of results as the simulation study. Our method efficiently increases the accuracy of diversity estimates and comprehension of the left tail of the species abundance distribution. We show how to choose the scale of sample size needed for a compromise between information gained, accuracy of the estimates and cost expended when assessing biological diversity. The sample size estimates are obtained from key community characteristics, such as the expected number of species in the community, the expected number of individuals in a sample and the evenness of the community.
NASA Astrophysics Data System (ADS)
Teeples, Ronald; Glyer, David
1987-05-01
Both policy and technical analysis of water delivery systems have been based on cost functions that are inconsistent with or are incomplete representations of the neoclassical production functions of economics. We present a full-featured production function model of water delivery which can be estimated from a multiproduct, dual cost function. The model features implicit prices for own-water inputs and is implemented as a jointly estimated system of input share equations and a translog cost function. Likelihood ratio tests are performed showing that a minimally constrained, full-featured production function is a necessary specification of the water delivery operations in our sample. This, plus the model's highly efficient and economically correct parameter estimates, confirms the usefulness of a production function approach to modeling the economic activities of water delivery systems.
UWB pulse detection and TOA estimation using GLRT
NASA Astrophysics Data System (ADS)
Xie, Yan; Janssen, Gerard J. M.; Shakeri, Siavash; Tiberius, Christiaan C. J. M.
2017-12-01
In this paper, a novel statistical approach is presented for time-of-arrival (TOA) estimation based on first path (FP) pulse detection using a sub-Nyquist sampling ultra-wide band (UWB) receiver. The TOA measurement accuracy, which cannot be improved by averaging of the received signal, can be enhanced by the statistical processing of a number of TOA measurements. The TOA statistics are modeled and analyzed for a UWB receiver using threshold crossing detection of a pulse signal with noise. The detection and estimation scheme based on the Generalized Likelihood Ratio Test (GLRT) detector, which captures the full statistical information of the measurement data, is shown to achieve accurate TOA estimation and allows for a trade-off between the threshold level, the noise level, the amplitude and the arrival time of the first path pulse, and the accuracy of the obtained final TOA.
NASA Astrophysics Data System (ADS)
Wang, Hongrui; Wang, Cheng; Wang, Ying; Gao, Xiong; Yu, Chen
2017-06-01
This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLE confidence interval and thus more precise estimation by using the related information from regional gage stations. The Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.
Modelling maximum river flow by using Bayesian Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Cheong, R. Y.; Gabda, D.
2017-09-01
Analysis of flood trends is vital since flooding threatens human living in terms of financial, environment and security. The data of annual maximum river flows in Sabah were fitted into generalized extreme value (GEV) distribution. Maximum likelihood estimator (MLE) raised naturally when working with GEV distribution. However, previous researches showed that MLE provide unstable results especially in small sample size. In this study, we used different Bayesian Markov Chain Monte Carlo (MCMC) based on Metropolis-Hastings algorithm to estimate GEV parameters. Bayesian MCMC method is a statistical inference which studies the parameter estimation by using posterior distribution based on Bayes’ theorem. Metropolis-Hastings algorithm is used to overcome the high dimensional state space faced in Monte Carlo method. This approach also considers more uncertainty in parameter estimation which then presents a better prediction on maximum river flow in Sabah.
Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J
2015-09-03
RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
Dai, James Y.; Hughes, James P.
2012-01-01
The meta-analytic approach to evaluating surrogate end points assesses the predictiveness of treatment effect on the surrogate toward treatment effect on the clinical end point based on multiple clinical trials. Definition and estimation of the correlation of treatment effects were developed in linear mixed models and later extended to binary or failure time outcomes on a case-by-case basis. In a general regression setting that covers nonnormal outcomes, we discuss in this paper several metrics that are useful in the meta-analytic evaluation of surrogacy. We propose a unified 3-step procedure to assess these metrics in settings with binary end points, time-to-event outcomes, or repeated measures. First, the joint distribution of estimated treatment effects is ascertained by an estimating equation approach; second, the restricted maximum likelihood method is used to estimate the means and the variance components of the random treatment effects; finally, confidence intervals are constructed by a parametric bootstrap procedure. The proposed method is evaluated by simulations and applications to 2 clinical trials. PMID:22394448
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles
2010-01-01
In this article the authors focus on the issue of the nonuniqueness of the maximum likelihood (ML) estimator of proficiency level in item response theory (with special attention to logistic models). The usual maximum a posteriori (MAP) method offers a good alternative within that framework; however, this article highlights some drawbacks of its…
Link Prediction in Evolving Networks Based on Popularity of Nodes.
Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian
2017-08-02
Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Yu, Shih-Heng; Chang, Dong-Shang
2014-01-01
This study investigates the risk factors in railway reconstruction project through complete literature reviews on construction project risks and scrutinizing experiences and challenges of railway reconstructions in Taiwan. Based on the identified risk factors, an assessing framework based on the fuzzy multicriteria decision-making (fuzzy MCDM) approach to help construction agencies build awareness of the critical risk factors on the execution of railway reconstruction project, measure the impact and occurrence likelihood for these risk factors. Subjectivity, uncertainty and vagueness within the assessment process are dealt with using linguistic variables parameterized by trapezoid fuzzy numbers. By multiplying the degree of impact and the occurrence likelihood of risk factors, estimated severity values of each identified risk factor are determined. Based on the assessment results, the construction agencies were informed of what risks should be noticed and what they should do to avoid the risks. That is, it enables construction agencies of railway reconstruction to plan the appropriate risk responses/strategies to increase the opportunity of project success and effectiveness. PMID:24772014
The Maximum Likelihood Solution for Inclination-only Data
NASA Astrophysics Data System (ADS)
Arason, P.; Levi, S.
2006-12-01
The arithmetic means of inclination-only data are known to introduce a shallowing bias. Several methods have been proposed to estimate unbiased means of the inclination along with measures of the precision. Most of the inclination-only methods were designed to maximize the likelihood function of the marginal Fisher distribution. However, the exact analytical form of the maximum likelihood function is fairly complicated, and all these methods require various assumptions and approximations that are inappropriate for many data sets. For some steep and dispersed data sets, the estimates provided by these methods are significantly displaced from the peak of the likelihood function to systematically shallower inclinations. The problem in locating the maximum of the likelihood function is partly due to difficulties in accurately evaluating the function for all values of interest. This is because some elements of the log-likelihood function increase exponentially as precision parameters increase, leading to numerical instabilities. In this study we succeeded in analytically cancelling exponential elements from the likelihood function, and we are now able to calculate its value for any location in the parameter space and for any inclination-only data set, with full accuracy. Furtermore, we can now calculate the partial derivatives of the likelihood function with desired accuracy. Locating the maximum likelihood without the assumptions required by previous methods is now straight forward. The information to separate the mean inclination from the precision parameter will be lost for very steep and dispersed data sets. It is worth noting that the likelihood function always has a maximum value. However, for some dispersed and steep data sets with few samples, the likelihood function takes its highest value on the boundary of the parameter space, i.e. at inclinations of +/- 90 degrees, but with relatively well defined dispersion. Our simulations indicate that this occurs quite frequently for certain data sets, and relatively small perturbations in the data will drive the maxima to the boundary. We interpret this to indicate that, for such data sets, the information needed to separate the mean inclination and the precision parameter is permanently lost. To assess the reliability and accuracy of our method we generated large number of random Fisher-distributed data sets and used seven methods to estimate the mean inclination and precision paramenter. These comparisons are described by Levi and Arason at the 2006 AGU Fall meeting. The results of the various methods is very favourable to our new robust maximum likelihood method, which, on average, is the most reliable, and the mean inclination estimates are the least biased toward shallow values. Further information on our inclination-only analysis can be obtained from: http://www.vedur.is/~arason/paleomag
Collinear Latent Variables in Multilevel Confirmatory Factor Analysis
van de Schoot, Rens; Hox, Joop
2014-01-01
Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influence of the size of the intraclass correlation coefficient (ICC) and estimation method; maximum likelihood estimation with robust chi-squares and standard errors and Bayesian estimation, on the convergence rate are investigated. The other variables of interest were rate of inadmissible solutions and the relative parameter and standard error bias on the between level. The results showed that inadmissible solutions were obtained when there was between level collinearity and the estimation method was maximum likelihood. In the within level multicollinearity condition, all of the solutions were admissible but the bias values were higher compared with the between level collinearity condition. Bayesian estimation appeared to be robust in obtaining admissible parameters but the relative bias was higher than for maximum likelihood estimation. Finally, as expected, high ICC produced less biased results compared to medium ICC conditions. PMID:29795827
How much to trust the senses: Likelihood learning
Sato, Yoshiyuki; Kording, Konrad P.
2014-01-01
Our brain often needs to estimate unknown variables from imperfect information. Our knowledge about the statistical distributions of quantities in our environment (called priors) and currently available information from sensory inputs (called likelihood) are the basis of all Bayesian models of perception and action. While we know that priors are learned, most studies of prior-likelihood integration simply assume that subjects know about the likelihood. However, as the quality of sensory inputs change over time, we also need to learn about new likelihoods. Here, we show that human subjects readily learn the distribution of visual cues (likelihood function) in a way that can be predicted by models of statistically optimal learning. Using a likelihood that depended on color context, we found that a learned likelihood generalized to new priors. Thus, we conclude that subjects learn about likelihood. PMID:25398975
Foreground effect on the J-factor estimation of classical dwarf spheroidal galaxies
NASA Astrophysics Data System (ADS)
Ichikawa, Koji; Ishigaki, Miho N.; Matsumoto, Shigeki; Ibe, Masahiro; Sugai, Hajime; Hayashi, Kohei; Horigome, Shun-ichi
2017-07-01
The gamma-ray observation of the dwarf spheroidal galaxies (dSphs) is a promising approach to search for the dark matter annihilation (or decay) signal. The dSphs are the nearby satellite galaxies with a clean environment and dense dark matter halo so that they give stringent constraints on the O(1) TeV dark matter. However, recent studies have revealed that current estimation of astrophysical factors relevant for the dark matter searches are not conservative, where the various non-negligible systematic uncertainties are not taken into account. Among them, the effect of foreground stars on the astrophysical factors has not been paid much attention, which becomes more important for deeper and wider stellar surveys in the future. In this article, we assess the effects of the foreground contamination by generating the mock samples of stars and using a model of future spectrographs. We investigate various data cuts to optimize the quality of the data and find that the cuts on the velocity and surface gravity can efficiently eliminate the contamination. We also propose a new likelihood function that includes the foreground distribution function. We apply this likelihood function to the fit of the three types of the mock data (Ursa Minor, Draco with large dark matter halo and Draco with small halo) and three cases of the observation. The likelihood successfully reproduces the input J-factor value while the fit without considering the foreground distribution gives a large deviation from the input value by a factor of 3.
Combining QMRA and Epidemiology to Estimate Campylobacteriosis Incidence.
Evers, Eric G; Bouwknegt, Martijn
2016-10-01
The disease burden of pathogens as estimated by QMRA (quantitative microbial risk assessment) and EA (epidemiological analysis) often differs considerably. This is an unsatisfactory situation for policymakers and scientists. We explored methods to obtain a unified estimate using campylobacteriosis in the Netherlands as an example, where previous work resulted in estimates of 4.9 million (QMRA) and 90,600 (EA) cases per year. Using the maximum likelihood approach and considering EA the gold standard, the QMRA model could produce the original EA estimate by adjusting mainly the dose-infection relationship. Considering QMRA the gold standard, the EA model could produce the original QMRA estimate by adjusting mainly the probability that a gastroenteritis case is caused by Campylobacter. A joint analysis of QMRA and EA data and models assuming identical outcomes, using a frequentist or Bayesian approach (using vague priors), resulted in estimates of 102,000 or 123,000 campylobacteriosis cases per year, respectively. These were close to the original EA estimate, and this will be related to the dissimilarity in data availability. The Bayesian approach further showed that attenuating the condition of equal outcomes immediately resulted in very different estimates of the number of campylobacteriosis cases per year and that using more informative priors had little effect on the results. In conclusion, EA was dominant in estimating the burden of campylobacteriosis in the Netherlands. However, it must be noted that only statistical uncertainties were taken into account here. Taking all, usually difficult to quantify, uncertainties into account might lead to a different conclusion. © 2016 Society for Risk Analysis.
Nowak, Michael D.; Smith, Andrew B.; Simpson, Carl; Zwickl, Derrick J.
2013-01-01
Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates. PMID:23755303
Method and system for diagnostics of apparatus
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2012-01-01
Proposed is a method, implemented in software, for estimating fault state of an apparatus outfitted with sensors. At each execution period the method processes sensor data from the apparatus to obtain a set of parity parameters, which are further used for estimating fault state. The estimation method formulates a convex optimization problem for each fault hypothesis and employs a convex solver to compute fault parameter estimates and fault likelihoods for each fault hypothesis. The highest likelihoods and corresponding parameter estimates are transmitted to a display device or an automated decision and control system. The obtained accurate estimate of fault state can be used to improve safety, performance, or maintenance processes for the apparatus.
Adaptive power priors with empirical Bayes for clinical trials.
Gravestock, Isaac; Held, Leonhard
2017-09-01
Incorporating historical information into the design and analysis of a new clinical trial has been the subject of much discussion as a way to increase the feasibility of trials in situations where patients are difficult to recruit. The best method to include this data is not yet clear, especially in the case when few historical studies are available. This paper looks at the power prior technique afresh in a binomial setting and examines some previously unexamined properties, such as Box P values, bias, and coverage. Additionally, it proposes an empirical Bayes-type approach to estimating the prior weight parameter by marginal likelihood. This estimate has advantages over previously criticised methods in that it varies commensurably with differences in the historical and current data and can choose weights near 1 when the data are similar enough. Fully Bayesian approaches are also considered. An analysis of the operating characteristics shows that the adaptive methods work well and that the various approaches have different strengths and weaknesses. Copyright © 2017 John Wiley & Sons, Ltd.
Likelihood ratios for glaucoma diagnosis using spectral-domain optical coherence tomography.
Lisboa, Renato; Mansouri, Kaweh; Zangwill, Linda M; Weinreb, Robert N; Medeiros, Felipe A
2013-11-01
To present a methodology for calculating likelihood ratios for glaucoma diagnosis for continuous retinal nerve fiber layer (RNFL) thickness measurements from spectral-domain optical coherence tomography (spectral-domain OCT). Observational cohort study. A total of 262 eyes of 187 patients with glaucoma and 190 eyes of 100 control subjects were included in the study. Subjects were recruited from the Diagnostic Innovations Glaucoma Study. Eyes with preperimetric and perimetric glaucomatous damage were included in the glaucoma group. The control group was composed of healthy eyes with normal visual fields from subjects recruited from the general population. All eyes underwent RNFL imaging with Spectralis spectral-domain OCT. Likelihood ratios for glaucoma diagnosis were estimated for specific global RNFL thickness measurements using a methodology based on estimating the tangents to the receiver operating characteristic (ROC) curve. Likelihood ratios could be determined for continuous values of average RNFL thickness. Average RNFL thickness values lower than 86 μm were associated with positive likelihood ratios (ie, likelihood ratios greater than 1), whereas RNFL thickness values higher than 86 μm were associated with negative likelihood ratios (ie, likelihood ratios smaller than 1). A modified Fagan nomogram was provided to assist calculation of posttest probability of disease from the calculated likelihood ratios and pretest probability of disease. The methodology allowed calculation of likelihood ratios for specific RNFL thickness values. By avoiding arbitrary categorization of test results, it potentially allows for an improved integration of test results into diagnostic clinical decision making. Copyright © 2013. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Luu, Gia Thien; Boualem, Abdelbassit; Duy, Tran Trung; Ravier, Philippe; Butteli, Olivier
Muscle Fiber Conduction Velocity (MFCV) can be calculated from the time delay between the surface electromyographic (sEMG) signals recorded by electrodes aligned with the fiber direction. In order to take into account the non-stationarity during the dynamic contraction (the most daily life situation) of the data, the developed methods have to consider that the MFCV changes over time, which induces time-varying delays and the data is non-stationary (change of Power Spectral Density (PSD)). In this paper, the problem of TVD estimation is considered using a parametric method. First, the polynomial model of TVD has been proposed. Then, the TVD model parameters are estimated by using a maximum likelihood estimation (MLE) strategy solved by a deterministic optimization technique (Newton) and stochastic optimization technique, called simulated annealing (SA). The performance of the two techniques is also compared. We also derive two appropriate Cramer-Rao Lower Bounds (CRLB) for the estimated TVD model parameters and for the TVD waveforms. Monte-Carlo simulation results show that the estimation of both the model parameters and the TVD function is unbiased and that the variance obtained is close to the derived CRBs. A comparison with non-parametric approaches of the TVD estimation is also presented and shows the superiority of the method proposed.
ERIC Educational Resources Information Center
Wothke, Werner; Burket, George; Chen, Li-Sue; Gao, Furong; Shu, Lianghua; Chia, Mike
2011-01-01
It has been known for some time that item response theory (IRT) models may exhibit a likelihood function of a respondent's ability which may have multiple modes, flat modes, or both. These conditions, often associated with guessing of multiple-choice (MC) questions, can introduce uncertainty and bias to ability estimation by maximum likelihood…
F-8C adaptive flight control extensions. [for maximum likelihood estimation
NASA Technical Reports Server (NTRS)
Stein, G.; Hartmann, G. L.
1977-01-01
An adaptive concept which combines gain-scheduled control laws with explicit maximum likelihood estimation (MLE) identification to provide the scheduling values is described. The MLE algorithm was improved by incorporating attitude data, estimating gust statistics for setting filter gains, and improving parameter tracking during changing flight conditions. A lateral MLE algorithm was designed to improve true air speed and angle of attack estimates during lateral maneuvers. Relationships between the pitch axis sensors inherent in the MLE design were examined and used for sensor failure detection. Design details and simulation performance are presented for each of the three areas investigated.
Eisenhauer, Philipp; Heckman, James J.; Mosso, Stefano
2015-01-01
We compare the performance of maximum likelihood (ML) and simulated method of moments (SMM) estimation for dynamic discrete choice models. We construct and estimate a simplified dynamic structural model of education that captures some basic features of educational choices in the United States in the 1980s and early 1990s. We use estimates from our model to simulate a synthetic dataset and assess the ability of ML and SMM to recover the model parameters on this sample. We investigate the performance of alternative tuning parameters for SMM. PMID:26494926
NASA Astrophysics Data System (ADS)
Husain, Hartina; Astuti Thamrin, Sri; Tahir, Sulaiha; Mukhlisin, Ahmad; Mirna Apriani, M.
2018-03-01
Breast cancer is one type of cancer that is the leading cause of death worldwide. This study aims to model the factors that affect the survival time and rate of cure of breast cancer patients. The extended cox model, which is a modification of the proportional hazard cox model in which the proportional hazard assumptions are not met, is used in this study. The maximum likelihood estimation approach is used to estimate the parameters of the model. This method is then applied to medical record data of breast cancer patient in 2011-2016, which is taken from Hasanuddin University Education Hospital. The results obtained indicate that the factors that affect the survival time of breast cancer patients are malignancy and leukocyte levels.
A class of Box-Cox transformation models for recurrent event data.
Sun, Liuquan; Tong, Xingwei; Zhou, Xian
2011-04-01
In this article, we propose a class of Box-Cox transformation models for recurrent event data, which includes the proportional means models as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the proposed models, we apply a profile pseudo-partial likelihood method to estimate the model parameters via estimating equation approaches and establish large sample properties of the estimators and examine its performance in moderate-sized samples through simulation studies. In addition, some graphical and numerical procedures are presented for model checking. An example of application on a set of multiple-infection data taken from a clinic study on chronic granulomatous disease (CGD) is also illustrated.
Estimation of selection intensity under overdominance by Bayesian methods.
Buzbas, Erkan Ozge; Joyce, Paul; Abdo, Zaid
2009-01-01
A balanced pattern in the allele frequencies of polymorphic loci is a potential sign of selection, particularly of overdominance. Although this type of selection is of some interest in population genetics, there exists no likelihood based approaches specifically tailored to make inference on selection intensity. To fill this gap, we present Bayesian methods to estimate selection intensity under k-allele models with overdominance. Our model allows for an arbitrary number of loci and alleles within a locus. The neutral and selected variability within each locus are modeled with corresponding k-allele models. To estimate the posterior distribution of the mean selection intensity in a multilocus region, a hierarchical setup between loci is used. The methods are demonstrated with data at the Human Leukocyte Antigen loci from world-wide populations.
NASA Technical Reports Server (NTRS)
Currit, P. A.
1983-01-01
The Cleanroom software development methodology is designed to take the gamble out of product releases for both suppliers and receivers of the software. The ingredients of this procedure are a life cycle of executable product increments, representative statistical testing, and a standard estimate of the MTTF (Mean Time To Failure) of the product at the time of its release. A statistical approach to software product testing using randomly selected samples of test cases is considered. A statistical model is defined for the certification process which uses the timing data recorded during test. A reasonableness argument for this model is provided that uses previously published data on software product execution. Also included is a derivation of the certification model estimators and a comparison of the proposed least squares technique with the more commonly used maximum likelihood estimators.
NASA Astrophysics Data System (ADS)
Aminah, Agustin Siti; Pawitan, Gandhi; Tantular, Bertho
2017-03-01
So far, most of the data published by Statistics Indonesia (BPS) as data providers for national statistics are still limited to the district level. Less sufficient sample size for smaller area levels to make the measurement of poverty indicators with direct estimation produced high standard error. Therefore, the analysis based on it is unreliable. To solve this problem, the estimation method which can provide a better accuracy by combining survey data and other auxiliary data is required. One method often used for the estimation is the Small Area Estimation (SAE). There are many methods used in SAE, one of them is Empirical Best Linear Unbiased Prediction (EBLUP). EBLUP method of maximum likelihood (ML) procedures does not consider the loss of degrees of freedom due to estimating β with β ^. This drawback motivates the use of the restricted maximum likelihood (REML) procedure. This paper proposed EBLUP with REML procedure for estimating poverty indicators by modeling the average of household expenditures per capita and implemented bootstrap procedure to calculate MSE (Mean Square Error) to compare the accuracy EBLUP method with the direct estimation method. Results show that EBLUP method reduced MSE in small area estimation.
On the Existence and Uniqueness of JML Estimates for the Partial Credit Model
ERIC Educational Resources Information Center
Bertoli-Barsotti, Lucio
2005-01-01
A necessary and sufficient condition is given in this paper for the existence and uniqueness of the maximum likelihood (the so-called joint maximum likelihood) estimate of the parameters of the Partial Credit Model. This condition is stated in terms of a structural property of the pattern of the data matrix that can be easily verified on the basis…
ERIC Educational Resources Information Center
Paek, Insu; Wilson, Mark
2011-01-01
This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…
Use of Bayes theorem to correct size-specific sampling bias in growth data.
Troynikov, V S
1999-03-01
The bayesian decomposition of posterior distribution was used to develop a likelihood function to correct bias in the estimates of population parameters from data collected randomly with size-specific selectivity. Positive distributions with time as a parameter were used for parametrization of growth data. Numerical illustrations are provided. The alternative applications of the likelihood to estimate selectivity parameters are discussed.
An evaluation of portion size estimation aids: precision, ease of use and likelihood of future use.
Faulkner, Gemma P; Livingstone, M Barbara E; Pourshahidi, L Kirsty; Spence, Michelle; Dean, Moira; O'Brien, Sinead; Gibney, Eileen R; Wallace, Julie Mw; McCaffrey, Tracy A; Kerr, Maeve A
2016-09-01
The present study aimed to evaluate the precision, ease of use and likelihood of future use of portion size estimation aids (PSEA). A range of PSEA were used to estimate the serving sizes of a range of commonly eaten foods and rated for ease of use and likelihood of future usage. For each food, participants selected their preferred PSEA from a range of options including: quantities and measures; reference objects; measuring; and indicators on food packets. These PSEA were used to serve out various foods (e.g. liquid, amorphous, and composite dishes). Ease of use and likelihood of future use were noted. The foods were weighed to determine the precision of each PSEA. Males and females aged 18-64 years (n 120). The quantities and measures were the most precise PSEA (lowest range of weights for estimated portion sizes). However, participants preferred household measures (e.g. 200 ml disposable cup) - deemed easy to use (median rating of 5), likely to use again in future (all scored either 4 or 5 on a scale from 1='not very likely' to 5='very likely to use again') and precise (narrow range of weights for estimated portion sizes). The majority indicated they would most likely use the PSEA preparing a meal (94 %), particularly dinner (86 %) in the home (89 %; all P<0·001) for amorphous grain foods. Household measures may be precise, easy to use and acceptable aids for estimating the appropriate portion size of amorphous grain foods.
Sun, Min; Wong, David; Kronenfeld, Barry
2016-01-01
Despite conceptual and technology advancements in cartography over the decades, choropleth map design and classification fail to address a fundamental issue: estimates that are statistically indifferent may be assigned to different classes on maps or vice versa. Recently, the class separability concept was introduced as a map classification criterion to evaluate the likelihood that estimates in two classes are statistical different. Unfortunately, choropleth maps created according to the separability criterion usually have highly unbalanced classes. To produce reasonably separable but more balanced classes, we propose a heuristic classification approach to consider not just the class separability criterion but also other classification criteria such as evenness and intra-class variability. A geovisual-analytic package was developed to support the heuristic mapping process to evaluate the trade-off between relevant criteria and to select the most preferable classification. Class break values can be adjusted to improve the performance of a classification. PMID:28286426
NASA Technical Reports Server (NTRS)
Wang, Shugong; Liang, Xu
2013-01-01
A new approach is presented in this paper to effectively obtain parameter estimations for the Multiscale Kalman Smoother (MKS) algorithm. This new approach has demonstrated promising potentials in deriving better data products based on data of different spatial scales and precisions. Our new approach employs a multi-objective (MO) parameter estimation scheme (called MO scheme hereafter), rather than using the conventional maximum likelihood scheme (called ML scheme) to estimate the MKS parameters. Unlike the ML scheme, the MO scheme is not simply built on strict statistical assumptions related to prediction errors and observation errors, rather, it directly associates the fused data of multiple scales with multiple objective functions in searching best parameter estimations for MKS through optimization. In the MO scheme, objective functions are defined to facilitate consistency among the fused data at multiscales and the input data at their original scales in terms of spatial patterns and magnitudes. The new approach is evaluated through a Monte Carlo experiment and a series of comparison analyses using synthetic precipitation data. Our results show that the MKS fused precipitation performs better using the MO scheme than that using the ML scheme. Particularly, improvements are significant compared to that using the ML scheme for the fused precipitation associated with fine spatial resolutions. This is mainly due to having more criteria and constraints involved in the MO scheme than those included in the ML scheme. The weakness of the original ML scheme that blindly puts more weights onto the data associated with finer resolutions is overcome in our new approach.
Kinematic Structural Modelling in Bayesian Networks
NASA Astrophysics Data System (ADS)
Schaaf, Alexander; de la Varga, Miguel; Florian Wellmann, J.
2017-04-01
We commonly capture our knowledge about the spatial distribution of distinct geological lithologies in the form of 3-D geological models. Several methods exist to create these models, each with its own strengths and limitations. We present here an approach to combine the functionalities of two modeling approaches - implicit interpolation and kinematic modelling methods - into one framework, while explicitly considering parameter uncertainties and thus model uncertainty. In recent work, we proposed an approach to implement implicit modelling algorithms into Bayesian networks. This was done to address the issues of input data uncertainty and integration of geological information from varying sources in the form of geological likelihood functions. However, one general shortcoming of implicit methods is that they usually do not take any physical constraints into consideration, which can result in unrealistic model outcomes and artifacts. On the other hand, kinematic structural modelling intends to reconstruct the history of a geological system based on physically driven kinematic events. This type of modelling incorporates simplified, physical laws into the model, at the cost of a substantial increment of usable uncertain parameters. In the work presented here, we show an integration of these two different modelling methodologies, taking advantage of the strengths of both of them. First, we treat the two types of models separately, capturing the information contained in the kinematic models and their specific parameters in the form of likelihood functions, in order to use them in the implicit modelling scheme. We then go further and combine the two modelling approaches into one single Bayesian network. This enables the direct flow of information between the parameters of the kinematic modelling step and the implicit modelling step and links the exclusive input data and likelihoods of the two different modelling algorithms into one probabilistic inference framework. In addition, we use the capabilities of Noddy to analyze the topology of structural models to demonstrate how topological information, such as the connectivity of two layers across an unconformity, can be used as a likelihood function. In an application to a synthetic case study, we show that our approach leads to a successful combination of the two different modelling concepts. Specifically, we show that we derive ensemble realizations of implicit models that now incorporate the knowledge of the kinematic aspects, representing an important step forward in the integration of knowledge and a corresponding estimation of uncertainties in structural geological models.
Benignus, Vernon A; Bushnell, Philip J; Boyes, William K
2011-12-01
Acute solvent exposures may contribute to automobile accidents because they increase reaction time and decrease attention, in addition to impairing other behaviors. These effects resemble those of ethanol consumption, both with respect to behavioral effects and neurological mechanisms. These observations, along with the extensive data on the relationship between ethanol consumption and fatal automobile accidents, suggested a way to estimate the probability of fatal automobile accidents from solvent inhalation. The problem can be approached using the logic of the algebraic transitive postulate of equality: if A=B and B=C, then A=C. We first calculated a function describing the internal doses of solvent vapors that cause the same magnitude of behavioral impairment as ingestion of ethanol (A=B). Next, we fit a function to data from the literature describing the probability of fatal car crashes for a given internal dose of ethanol (B=C). Finally, we used these two functions to generate a third function to estimate the probability of a fatal car crash for any internal dose of organic solvent vapor (A=C). This latter function showed quantitatively (1) that the likelihood of a fatal car crash is increased by acute exposure to organic solvent vapors at concentrations less than 1.0 ppm, and (2) that this likelihood is similar in magnitude to the probability of developing leukemia from exposure to benzene. This approach could also be applied to other potentially adverse consequences of acute exposure to solvents (e.g., nonfatal car crashes, property damage, and workplace accidents), if appropriate data were available. © 2011 Society for Risk Analysis Published 2011. This article is a U.S. Government work and is in the public domain for the U.S.A.
Bhadra, Dhiman; Daniels, Michael J.; Kim, Sungduk; Ghosh, Malay; Mukherjee, Bhramar
2014-01-01
In a typical case-control study, exposure information is collected at a single time-point for the cases and controls. However, case-control studies are often embedded in existing cohort studies containing a wealth of longitudinal exposure history on the participants. Recent medical studies have indicated that incorporating past exposure history, or a constructed summary measure of cumulative exposure derived from the past exposure history, when available, may lead to more precise and clinically meaningful estimates of the disease risk. In this paper, we propose a flexible Bayesian semiparametric approach to model the longitudinal exposure profiles of the cases and controls and then use measures of cumulative exposure based on a weighted integral of this trajectory in the final disease risk model. The estimation is done via a joint likelihood. In the construction of the cumulative exposure summary, we introduce an influence function, a smooth function of time to characterize the association pattern of the exposure profile on the disease status with different time windows potentially having differential influence/weights. This enables us to analyze how the present disease status of a subject is influenced by his/her past exposure history conditional on the current ones. The joint likelihood formulation allows us to properly account for uncertainties associated with both stages of the estimation process in an integrated manner. Analysis is carried out in a hierarchical Bayesian framework using Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms. The proposed methodology is motivated by, and applied to a case-control study of prostate cancer where longitudinal biomarker information is available for the cases and controls. PMID:22313248
Statistical modelling of thermal annealing of fission tracks in apatite
NASA Astrophysics Data System (ADS)
Laslett, G. M.; Galbraith, R. F.
1996-12-01
We develop an improved methodology for modelling the relationship between mean track length, temperature, and time in fission track annealing experiments. We consider "fanning Arrhenius" models, in which contours of constant mean length on an Arrhenius plot are straight lines meeting at a common point. Features of our approach are explicit use of subject matter knowledge, treating mean length as the response variable, modelling of the mean-variance relationship with two components of variance, improved modelling of the control sample, and using information from experiments in which no tracks are seen. This approach overcomes several weaknesses in previous models and provides a robust six parameter model that is widely applicable. Estimation is via direct maximum likelihood which can be implemented using a standard numerical optimisation package. Because the model is highly nonlinear, some reparameterisations are needed to achieve stable estimation and calculation of precisions. Experience suggests that precisions are more convincingly estimated from profile log-likelihood functions than from the information matrix. We apply our method to the B-5 and Sr fluorapatite data of Crowley et al. (1991) and obtain well-fitting models in both cases. For the B-5 fluorapatite, our model exhibits less fanning than that of Crowley et al. (1991), although fitted mean values above 12 μm are fairly similar. However, predictions can be different, particularly for heavy annealing at geological time scales, where our model is less retentive. In addition, the refined error structure of our model results in tighter prediction errors, and has components of error that are easier to verify or modify. For the Sr fluorapatite, our fitted model for mean lengths does not differ greatly from that of Crowley et al. (1991), but our error structure is quite different.
Corredor, Germán; Whitney, Jon; Arias, Viviana; Madabhushi, Anant; Romero, Eduardo
2017-01-01
Abstract. Computational histomorphometric approaches typically use low-level image features for building machine learning classifiers. However, these approaches usually ignore high-level expert knowledge. A computational model (M_im) combines low-, mid-, and high-level image information to predict the likelihood of cancer in whole slide images. Handcrafted low- and mid-level features are computed from area, color, and spatial nuclei distributions. High-level information is implicitly captured from the recorded navigations of pathologists while exploring whole slide images during diagnostic tasks. This model was validated by predicting the presence of cancer in a set of unseen fields of view. The available database was composed of 24 cases of basal-cell carcinoma, from which 17 served to estimate the model parameters and the remaining 7 comprised the evaluation set. A total of 274 fields of view of size 1024×1024 pixels were extracted from the evaluation set. Then 176 patches from this set were used to train a support vector machine classifier to predict the presence of cancer on a patch-by-patch basis while the remaining 98 image patches were used for independent testing, ensuring that the training and test sets do not comprise patches from the same patient. A baseline model (M_ex) estimated the cancer likelihood for each of the image patches. M_ex uses the same visual features as M_im, but its weights are estimated from nuclei manually labeled as cancerous or noncancerous by a pathologist. M_im achieved an accuracy of 74.49% and an F-measure of 80.31%, while M_ex yielded corresponding accuracy and F-measures of 73.47% and 77.97%, respectively. PMID:28382314
The effects of velocities and lensing on moments of the Hubble diagram
NASA Astrophysics Data System (ADS)
Macaulay, E.; Davis, T. M.; Scovacricchi, D.; Bacon, D.; Collett, T.; Nichol, R. C.
2017-05-01
We consider the dispersion on the supernova distance-redshift relation due to peculiar velocities and gravitational lensing, and the sensitivity of these effects to the amplitude of the matter power spectrum. We use the Method-of-the-Moments (MeMo) lensing likelihood developed by Quartin et al., which accounts for the characteristic non-Gaussian distribution caused by lensing magnification with measurements of the first four central moments of the distribution of magnitudes. We build on the MeMo likelihood by including the effects of peculiar velocities directly into the model for the moments. In order to measure the moments from sparse numbers of supernovae, we take a new approach using Kernel density estimation to estimate the underlying probability density function of the magnitude residuals. We also describe a bootstrap re-sampling approach to estimate the data covariance matrix. We then apply the method to the joint light-curve analysis (JLA) supernova catalogue. When we impose only that the intrinsic dispersion in magnitudes is independent of redshift, we find σ _8=0.44^{+0.63}_{-0.44} at the one standard deviation level, although we note that in tests on simulations, this model tends to overestimate the magnitude of the intrinsic dispersion, and underestimate σ8. We note that the degeneracy between intrinsic dispersion and the effects of σ8 is more pronounced when lensing and velocity effects are considered simultaneously, due to a cancellation of redshift dependence when both effects are included. Keeping the model of the intrinsic dispersion fixed as a Gaussian distribution of width 0.14 mag, we find σ _8 = 1.07^{+0.50}_{-0.76}.